Bit hacking: Addition the bit hack way

Disclaimer: I would ask that none of this code actually be used in software; it doesn't offer anything, if your CPU has an ADD instruction, use it. Otherwise this is just a teaching aid.

Addition is a basic mathematical operation which can be emulated with some bitwise operations. Lets look at a canonical implementation:

uint8_t add(uint8_t x, uint8_t y) {

  uint8_t a;

  uint8_t b;

  do {

    a = x & y;

    b = x ^ y;

    x = a << 1;

    y = b;

  } while (a);

  return b;

}

To understand what is going on lets follow out with a basic example:
Let in:x = 00000011 (3)
Let in:y = 00000100 (4)

When we invoke this add function the steps taking place are:

AND -- a = (x&y)
XOR -- b = (x^y)
LSHFT -- x = (a<<1)

Resulting in the following:

x = 00000011
y = 00000100
a = 00000000
b = 00000100
x = 00000000
y = 00000111 -- XOR a:b = 7

This canonical implementation works, but it's slow. We can make it slightly faster, to see why, lets take a look at the assembly that code translates to (note: all code is x86_64 non-optimized)

add:

  pushq %rbp

  movq  %rsp,   %rbp

  movl  %edi,   %edx

  movl  %esi,   %eax

  movb  %dl -20(%rbp) ; y

  movb  %al -24(%rbp) ; x

  .loop:

    movzbl -24(%rbp),   %eax 

    movzbl -20(%rbp),   %edx

    andl       %edx,    %eax

movb %al, -1(%rbp)

movzbl -24(%rbp), %eax

movzbl -20(%rbp), %edx

xorl %edx, %eax

movb %al, -2(%rbp)

movzbl -1(%rbp), %eax

addl %eax, %eax

movb %al, -20(%rbp)

movzbl -2(%rbp), %eax

movb %al, -24(%rbp)

cmpb $0x0, -1(%rbp)

jne .loop

movzbl -2(%rbp), %eax

popq %rbp

ret

There is 9+(15*iterations) worth instructions here. Assuming the best case senerio, that is 24 instructions. We can refine our implementation to be slightly faster.

void add(uint8_t *x, uint8_t *y) {

  do {

*y = (*x^*y);
*x = (*x&(*x^*y)) << 1;

  } while (*x);

}

In this version we've removed the return, and replaced our arguments with pointers to our variables, storing the result of the addition in *y, and using no temporaries!. This is (best case) 14! instruction less:

add:

  movzbl (%rsi), %eax

  .loop:

    movzbl (%rdi),      %ecx

    movl    %eax,       %edx

    andl    %ecx,       %edx

    xorl    %ecx,       %eax

    leal   (%rdx,%rdx), %ecx

    testb   %dl,        %dl

    movb    %cl,       (%rdi)

    movb    %al,       (%rsi)

    jne .loop

rep

  ret

Of course since this function is small we can most likely make it inline, (which could allow for better optimization).  I'd personally perfer a macro before the inline keyword.

#define add(x,y) do{*y=(*x^*y),*x=(*x&(*x^*y))<<1;} while(*x)

That concludes addition by bit hacking. I've played around with many other methods, but this is 3+(9*iterations) worth instructions, or best case (11 instructions). I challenge anyone to beat this for best case.  Must be C code (no assembly) .. post solution in the comments.  Hint: abuse stack.

Bit hacking

Friday, April 6, 2012

Addition the bit hack way

No comments:

Post a Comment