budoattack
budoattack

Reputation: 475

C bitfield with assigned value 1 shows -1

I played with bit-fields and stuck with some strange thing:

#include <stdio.h>

struct lol {
     int a;
     int b:1,
         c:1,
         d:1,
         e:1;
     char f;
};

int main(void) {
     struct lol l = {0};
     l.a = 123;
     l.c = 1; // -1 ???
     l.f = 'A';

     printf("%d %d %d %d %d %c\n", l.a, l.b, l.c, l.d, l.e, l.f);

     return 0;
}

The output is:

123 0 -1 0 0 A

Somehow the value of l.c is -1. What is the reason?
Sorry if obvious.

Upvotes: 1

Views: 436

Answers (3)

Petr Skocik
Petr Skocik

Reputation: 60107

Use unsigned bitfields if you don't want sign-extension.

What you're getting is your 1 bit being interpreted as the sign bit in a two's complement representation. In two's complement, the sign-bit is the highest bit and it's interpreted as -(2^(width_of_the_number-1)), in your case -(2^(1-1)) == -(2^0) == -1. Normally all other bits offset this (because they're interpreted as positive) but a 1-bit number doesn't and can't have other bits, so you just get -1.

Take for example 0b10000000 as a as int8_t in two's complement. (For the record, 0b10000000 == 0x80 && 0x80 == (1<<7)). It's the highest bit so it's interpreted as -(2^7) (==-128) and there's no positive bits to offset it, so you get printf("%d\n", (int8_t)0x80); /*-128*/

Now if you set all bits on, you get -1, because -128 + (128-1) == -1. This (all bits on == -1) holds true for any width interpreted as in two's complement–even for width 1, where you get -1 + (1-1) == -1`.

When such a signed integer gets extended into a wider width, it undergoes so called sign extension.

Sign extension means that the highest bit gets copied into all the newly added higher bits.

If the highest bit is 0, then it's trivial to see that sign extension doesn't change the value (take for example 0x01 extended into 0x00000001).

When the highest bit is 1 as in (int8_t)0xff (all 8 bits 1), then sign extension copies the sign bit into all the new bits: ((int32_t)(int8_t)0xff == (int32_t)0xffffffff). ((int32_t)(int8_t)0x80 == (int32_t)0xffffff80) might be a better example as it more clearly shows the 1 bits are added at the high end (try _Static_assert-ing either of these).

This doesn't change the value either as long as you assume two's complement, because if you start at:

-(2^n) (value of sign bit) + X (all the positive bits) //^ means exponentiation here

and add one more 1-bit to the highest position, then you get:

-(2^(n+1)) + 2^(n) +  X

which is

2*-(2^(n)) + 2^(n) +  X == -(2^n) + X //same as original
//inductively, you can add any number of 1 bits

Sign extension normally happens when you width-extend a native signed integer into a native wider width (signed or unsigned), either with casts or implicit conversions. For the native widths, platforms usually have an instruction for it.

Example:

int32_t signExtend8(int8_t X) { return X; }

Example's dissassembly on x86_64:

signExtend8:
        movsx   eax, dil //the sx stands for Sign-eXtending
        ret

If you want to make it work for nonstandard widths, you can usually utilize the fact that signed-right-shifts normally copy the the sign bit alongside the shifted range (it's really implementation defined what signed right-shifts do) and so you can unsigned-left-shift into the sign bit and then back to get sign-extension artificially for non-native width such as 2:

#include <stdint.h>
#define MC_signExtendIn32(X,Width) ((int32_t)((uint32_t)(X)<<(32-(Width)))>>(32-(Width)))
_Static_assert( MC_signExtendIn32(3,2 /*width 2*/)==-1,"");
int32_t signExtend2(int8_t X) { return MC_signExtendIn32(X,2); }

Disassembly (x86_64):

signExtend2:
        mov     eax, edi
        sal     eax, 30
        sar     eax, 30
        ret

Signed bitfields essentially make the compiler generate (hidden) macros like the above for you:

struct bits2 { int bits2:2; };
int32_t signExtend2_via_bitfield(struct bits2 X) { return X.bits2; }

Disassembly (x86_64) on clang:

signExtend2_via_bitfield:               # @signExtend2_via_bitfield
        mov     eax, edi
        shl     eax, 30
        sar     eax, 30
        ret

Example code on godbolt: https://godbolt.org/z/qxd5o8 .

Upvotes: 4

Lundin
Lundin

Reputation: 214515

Bit-fields are very poorly standardized and they are generally not guaranteed to behave predictably. The standard just vaguely states (6.7.2.1/10):

A bit-field is interpreted as having a signed or unsigned integer type consisting of the specified number of bits.125)

Where the informative note 125) says:

125) As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int, then it is implementation-defined whether the bit-field is signed or unsigned.

So we can't know if int b:1 gives a signed type or unsigned type, it's up to the compiler. Your compiler apparently decided that it would be a great idea to have signed bits. So it treats your 1 bit as binary translated into a two's complement 1 bit number, where binary 1 is decimal -1 and zero is zero.


Furthermore, we can't know where b in your code ends up in memory, it could be anywhere and also depends on endianess. What we do know is that you save absolutely no memory from using a bit-field here, since at least 16 bits for an int will get allocated anyway.

General good advise:

  • Never use bit-fields for any purpose.
  • Use the bit-wise operators << >> | & ^ ~ and named bit-masks instead, for 100% portable and well-defined code.
  • Use the stdint.h types or at least unsigned ones whenver dealing with raw binary.

Upvotes: 1

Nastor
Nastor

Reputation: 638

You are using a signed integer, and since the representation of 1 in binary has the very first bit set to 1, in a signed representation that is translated with the existence of negative signedness, so you get -1. As other comments suggest, use the unsigned keyword to remove the possibility to represent negative integers.

Upvotes: 0

Related Questions