Henri Menke
Henri Menke

Reputation: 10939

Accessing the bits in char through a bitfield

I want to access the bits in a char individually. There are several questions and answers on this topic here on SO, but they all suggest to use boolean mathematics. However, for my use it would be more convenient if I could simply name the bits separately. So I was thinking of just accessing the char through a bitfield, like so

#include <stdbool.h>
#include <stdio.h>

typedef struct {
    bool _1 : 1, _2 : 1, _3 : 1, _4 : 1, _5 : 1, _6 : 1, _7 : 1, _8 : 1;
} bits;

int main() {
    char c = 0;
    bits *b = (bits *)&c;
    b->_3 = 1;
    printf("%s\n", c & 0x4 ? "true" : "false");
}

This compiles without errors or warnings with gcc -Wall -Wextra -Wpedantic test.c. When running the resulting executable with valgrind it reports no memory faults. The assembly generated for the b->_3 = 1; assignment is or eax, 4 which is sound.

Questions

N.B.: I'm aware that it might cause trouble for mixed endianness but I only have little endian.

Upvotes: 1

Views: 559

Answers (1)

Lundin
Lundin

Reputation: 215360

Is this defined behaviour in C?
Is this defined behaviour in C++?

TL;DR: no it is not.

The boolean bitfield is well-defined as far as: bool is an ok type to use for bit-fields, so you are guaranteed to get a blob of 8 booleans allocated somewhere in memory. If you access boolean _1, you'll get the same value as last time you accessed that variable.

What is not defined is the bit order. The compiler may insert padding bits or padding bytes as it pleases. All of that is implementation-defined and non-portable. So you can't really know where _1 is located in memory or if it is the MSB or LSB. None of that is well-defined.

However, bits *b = (bits *)&c; accessing a char through a struct pointer is a strict aliasing violation and may also cause alignment problems. It is undefined behavior in C and C++ both. You would need to at least show this struct into a union with a char to dodge strict aliasing, but you may still get alignment hiccups (and C++ frowns at type punning through unions).

(And going from boolean type to character type can give some real crazy results too, see _Bool type and strict aliasing)


None of this is convenient at all - bitfields are very poorly defined. It is much better to simply do:

c |= 1u << n;     // set bit n
c &= ~(1u << n);  // clear bit n

This is portable, type generic and endianess-independent.

(Though to dodge change of signedness due to implicit integer promotions, it is good practice to always cast the result of ~ back to the intended type: c &= (uint8_t) ~(1u << n);).

Note that the type char is entirely unsuitable for bitwise arithmetic since it may or may not be signed. Instead you should use unsigned char or preferably uint8_t.

Upvotes: 4

Related Questions