tonihele
tonihele

Reputation: 49

Sign extension, bit shifting in JAVA. Help understanding a C-code bit

I have the following C-code (from FFMPEG):

static inline av_const int sign_extend(int val, unsigned bits)
{
    unsigned shift = 8 * sizeof(int) - bits;
    union { unsigned u; int s; } v = { (unsigned) val << shift };
    return v.s >> shift;
}

I'm trying to reproduce this in JAVA. But I have difficulties understanding this. No matter how I toss the bits around, I don't get very close.

As for the value parameter: it takes unsigned byte value as int.

Bits parameter: 4

If the value is 255 and bits is 4. It returns -1. I can't reproduce this in JAVA. Sorry for such fuzzy question. But can you help me understand this code?

The big picture is that I'm trying to encode EA ADPCM audio in JAVA. In FFMPEG: https://gitorious.org/ffmpeg/ffmpeg/source/c60caa5769b89ab7dc4aa41a21f87d5ee177bd30:libavcodec/adpcm.c#L981

Upvotes: 1

Views: 3707

Answers (3)

user3629249
user3629249

Reputation: 16550

given this code:

static inline av_const int sign_extend(int val, unsigned bits)
{
    unsigned shift = 8 * sizeof(int) - bits;
    union { unsigned u; int s; } v = { (unsigned) val << shift };
    return v.s >> shift;
}

the 'static' modifier says the function is not visible outside the current file.

The 'inline' modifier is a 'request' to the compiler to place the code 'inline' whereever the function is called rather than having a separate function with the associated call/return code sequences

the 'sign_extend' is the name of the function

 in C, a right shift, for a signed value will propagate the sign bit,
 In C, a right shift, for a unsigned value will zero fill.
 It looks like your java is doing the zero fill.

 regarding this line: 
 unsigned shift = 8 * sizeof(int) - bits;
 on a 32bit machine, an integer is 32 bits and size of int is 4
 so the variable 'shift' will contain (8*4)-bits

regarding this line:
union { unsigned u; int s; } v = { (unsigned) val << shift };
 left shift of unsigned will shift the bits left,
 with the upper bits being dropped into the bit bucket
 and the lower bits being zero filled.

regarding this line:
return v.s >> shift;
this shifts the bits back to their original position,
while propagating the (new) sign bit

Upvotes: 0

The reason why the code looks so odd is that C language is full of undefined behaviours that in Java are well-defined. For example in C bit-shifting a signed integer left so that the sign-bit changes is undefined behaviour and at that point the program can do anything - whatever the compiler causes the program to do - crash, print 42, make true = false, anything can happen, and the compiler still compiled it correctly.

Now the code uses a 1 trick to shift the integer left: it uses an union that lays the bytes of members top of each other - making an unsigned and an signed integer to occupy the same bytes; the bitshift is defined with the unsigned integer; so we do the unsigned shift using it; then shift back using the signed shift (the code assumes that the right shift of a negative number produces properly sign-extended negative numbers, which is also not guaranteed by standard but usually these kinds of libraries have a configuration utility that can refuse compilation on such a quite esoteric platform; likewise this program assumes that CHAR_BIT is 8; however C only makes a guarantee that a char is at least 8 bits wide.

In Java, you do not need anything like a union to accomplish this; instead you do:

static int signExtend(int val, int bits) {
    int shift = 32 - bits;  // fixed size
    int v = val << shift;
    return v >> shift;
}

In Java the width of an int is always 32 bits; << can be used for both signed and unsigned shift; and there is no undefined behaviour for extending to the sign bit; >> can be used for signed shift (>>> would be unsigned).

Upvotes: 1

Wintermute
Wintermute

Reputation: 44073

Strictly speaking, the result of running this code with this input data has unspecified results because signed bitshift in C is only properly defined in circumstances that this scenario does not meet. From the C99 standard:

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has unsigned type or if E1 has signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and negative value, the resulting value is implementation-defined.

(Emphasis mine)

But let's assume that your implementation defines signed right shift to extend the sign, meaning that the space on the left will be filled with ones if the sign bit is set and zeroes otherwise; the ffmpeg code clearly expects this to be the case. The following is happening: shift has the value of 28 (assuming 32-bit integers). In binary notation:

00000000 00000000 00000000 11111111 = val
11110000 00000000 00000000 00000000 = (unsigned) val << shift

Note that when interpreting (unsigned) val << shift as signed integer, as the code proceeds to do (assuming two's complement representation, as today's computers all use1), the sign bit of that integer is set, so a signed shift to the right fills up with zeroes from the left, and we get

11110000 00000000 00000000 00000000 = v.s
11111111 11111111 11111111 11111111 = v.s >> shift

...and in two's complement representation, that is -1.

In Java, this trick works the same way -- except better, because there the behavior is actually guaranteed. Simply:

public static int sign_extend(int val, int bits) {
  int shift = 32 - bits;  // int always has 32 bits in Java
  int s = val << shift;
  return s >> shift;
}

Or, if you prefer:

public static int sign_extend(int val, int bits) {
  int shift = 32 - bits;
  return val << shift >> shift;
}

1 Strictly speaking, this conversion does not have a well-defined value in the C standard either, for historical reasons. There used to be computers that used different representations, and the same bit pattern with a set sign bit has a completely different meaning in (for example) signed magnitude representation.

Upvotes: 4

Related Questions