Bit hacking: 2013

The problem with userdefined boolean types in C.

It's become apparent that a lot of people prefer to define their own
boolean types in C, expecially if the target compiler lacks a _Bool
type. The problem is all versions of it are fundamentally flawed.
Lets look at some typical example of this:

    typedef int bool; // wrong
    typedef unsigned int bool; // wrong
    typedef char bool; // wrong
    typedef unsigned char bool; // wrong

Infact bool cannot be represented at all by a typedef, and I'll explain
why shortly, but lets first assume the most typical implementation, and
the problems that can arise from it. The most typical typedef I've seen
for boolean is int. Lets consider the following code (that is intended to
operate between C/C++).

    // I see this ALL the time
    #ifndef __cplusplus
    typedef int bool; // harmless looking
    #endif

    struct foo { 
        bool a;
        bool b;
    };

When this is compiled in C, the sizeof this structure (assuming 4-byte
ints) is 8. However in C++ sizeof(bool) == 1, so you'd assume this
structure is 2 bytes. That would be correct, these structures do not match
at all in memory. The thing is people get away with this sort of stuff
because of typical computing architectures. On my x86, structure elements
are all aligned on 4-byte boundaries, the low-order byte always comes first
as well, these are the only two architectural-specific factors keeping this
structure from blowing up in your face.

Some would argue that is a far reach, but there is an even bigger issue
with this typedef, an issue that wouldn't technically manifest unless
you're trying to converse memory, mainly bit-fields. Lets look at a typical
example in C.

    typedef int bool;
    struct {
        bool a : 1;
    } foo;

    foo.a = 1; // whats wrong with this code?

This innocent looking code has more problems than you'd expect, mainly
there is an implicit overflow of the constant expression `foo.a=1`.
If you enable pedantic compiler errors on GCC you'll get:

foo.c:7:5: warning: overflow in implicit constant conversion [-Woverflow]
     foo.a = 1;

The reason is the semantics of signed integer truncated to 1-bit, you get
just the sign-bit, so you can have -1 or 0, not 0 and 1. This comes as much
to surprise to someone because the use of bool in bit-fields is explicitly
allowed by C. To quote:

`A bit-field is interpreted as a signed or unsigned integer type consisting
 of the specified number of bits. If the value 0 or 1 is stored into a 
 nonzero-width bit-field of type _Bool, the value of the bit-field shall
 compare equal to the value stored.`

You could technically fix that issue by using:
    typedef unsigned int bool;

But then you still have the C++ issue with sizeof(bool) for forward
compatibility, and another issue that just causes more pain, now that
bool is unsigned int, your code becomes littered with comparisons between
signed (bool in C++ is signed) and unsigned (typedef bool is unsigned)
boolean values. All these comparisons cause a ton of warnings with any
decent C++ compiler, and if your project is pedantic about code correctness
will even error.

As for the other variations of typedef, they break bit-fields. The only
valid types that can be used in a bit-field are signed/unsigned int and
_Bool, everything else is undefined (GCC however does accept it as an
extension).

But to reiterate on the main premise here, _Bool simply cannot be represented
by a typedef, if you want the semantics of it to be correct. Which is why
it was added as a type in the first place, so if you want to do booleans in
C, just stick to being explicit about integers, for bitfields be explicit
with unsigned, unless you're targeting C99, then you're guaranteed _Bool.

Saturday, July 27, 2013