Disclaimer: I would ask that none of this code actually be used in software; it doesn't offer anything, if your CPU has an ADD instruction, use it. Otherwise this is just a teaching aid.
Addition is a basic mathematical operation which can be emulated with some bitwise operations. Lets look at a canonical implementation:
uint8_t add(uint8_t x, uint8_t y) {
uint8_t a;
uint8_t b;
do {
a = x & y;
b = x ^ y;
x = a << 1;
y = b;
} while (a);
return b;
}To understand what is going on lets follow out with a basic example:
Let in:x = 00000011 (3)
Let in:y = 00000100 (4)
When we invoke this add function the steps taking place are:
- AND -- a = (x&y)
- XOR -- b = (x^y)
- LSHFT -- x = (a<<1)
Resulting in the following:
- x = 00000011
- y = 00000100
- a = 00000000
- b = 00000100
- x = 00000000
- y = 00000111 -- XOR a:b = 7
This canonical implementation works, but it's slow. We can make it slightly faster, to see why, lets take a look at the assembly that code translates to (note: all code is x86_64 non-optimized)
add:
pushq %rbp
movq %rsp, %rbp
movl %edi, %edx
movl %esi, %eax
movb %dl -20(%rbp) ; y
movb %al -24(%rbp) ; x
.loop:
movzbl -24(%rbp), %eax
movzbl -20(%rbp), %edx
andl %edx, %eax
movb %al, -1(%rbp)
movzbl -24(%rbp), %eax
movzbl -20(%rbp), %edx
xorl %edx, %eax
movb %al, -2(%rbp)
movzbl -1(%rbp), %eax
addl %eax, %eax
movb %al, -20(%rbp)
movzbl -2(%rbp), %eax
movb %al, -24(%rbp)
cmpb $0x0, -1(%rbp)
jne .loop
movzbl -2(%rbp), %eax
popq %rbp
ret There is 9+(15*iterations) worth instructions here. Assuming the best case senerio, that is 24 instructions. We can refine our implementation to be slightly faster.
void add(uint8_t *x, uint8_t *y) {
do {
*y = (*x^*y);*x = (*x&(*x^*y)) << 1;
} while (*x);
}In this version we've removed the return, and replaced our arguments with pointers to our variables, storing the result of the addition in *y, and using no temporaries!. This is (best case) 14! instruction less:
add:
movzbl (%rsi), %eax
.loop:
movzbl (%rdi), %ecx
movl %eax, %edx
andl %ecx, %edx
xorl %ecx, %eax
leal (%rdx,%rdx), %ecx
testb %dl, %dl
movb %cl, (%rdi)
movb %al, (%rsi)
jne .loop
rep
ret
That concludes addition by bit hacking. I've played around with many other methods, but this is 3+(9*iterations) worth instructions, or best case (11 instructions). I challenge anyone to beat this for best case. Must be C code (no assembly) .. post solution in the comments. Hint: abuse stack.
Of course since this function is small we can most likely make it inline, (which could allow for better optimization). I'd personally perfer a macro before the inline keyword.
#define add(x,y) do{*y=(*x^*y),*x=(*x&(*x^*y))<<1;} while(*x)
That concludes addition by bit hacking. I've played around with many other methods, but this is 3+(9*iterations) worth instructions, or best case (11 instructions). I challenge anyone to beat this for best case. Must be C code (no assembly) .. post solution in the comments. Hint: abuse stack.