Curious
Curious

Reputation: 111

How to divide an Int into two Bytes in C?

I am working with software embedded in minimal hardware that only supports ANSI C and has minimal versions of the standard IO libraries.

I have an Int variable, two bytes in size, but I need to divide it into 2 bytes separately to be able to transmit it, and then I can, reading the two bytes, reassemble the original Int.

I can think of some binary division of each byte like this:

int valor = 522;  // 0000 0010 0000 1010 (entero de 2 bytes)
byte superior = byteSuperior(valor);  // 0000 0010
byte inferior = byteInferioror(valor);  // 0000 1010
...
int valorRestaurado = bytesToInteger(superior, inferior); // 522

but I do not succeed in a simple way of dividing the whole by its weight and it gives me the feeling that it should be trivial (such as with bit shifting) and I do not discover it.

Actually, any solution that divides the whole into 2 bytes and reassembles it serves me well.

From already thank you very much!

Upvotes: 3

Views: 15473

Answers (7)

Steve Summit
Steve Summit

Reputation: 48010

As you can see from the several answers so far, there are multiple approaches, and some perhaps surprising subtleties.

  1. "Mathematical" approach. You separate the bytes using shifting and masking (or, equivalently, division and remainder), and recombine them similarly. This is "option 1" in Felix Palmen's answer. This approach has the advantage that it is completely independent of "endianness" issues. It has the complication that it's subject to some sign-extension and implementation-definedness issues. It's safest if you use an unsigned type for both the composite int and the byte-separated parts of the equation. If you use signed types, you'll typically need extra casts and/or masks. (But with that said, this is the approach I prefer.)

  2. "Memory" approach. You use pointers, or a union, to directly access the bytes making up an int. This is "option 2" in Felix Palmen's answer. The very significant issue here is byte order, or "endianness". Also, depending on how you implement it, you may run afoul of the "strict aliasing" rule.

If you use the "mathematical" approach, make sure you test it on values that both do and don't have the high bit of the various bytes set. For example, for 16 bits, a complete set of tests might include the values 0x0101, 0x0180, 0x8001, and 0x8080. If you don't write the code correctly (if you implement it using signed types, or if you leave out some of the otherwise necessary masks), you will typically find extra 0xff's creeping into the reconstructed result, corrupting the transmission. (Also, you might want to think about writing a formal unit test, so that you can maximize the likelihood that the code will be re-tested, and any latent bugs detected, if/when it's ported to a machine which makes different implementation choices which affect it.)

If you do want to transmit signed values, you will have a few additional complications. In particular, if you reconstruct your 16-bit integer on a machine where type int is bigger than 16 bits, you may have to explicitly sign extend it to preserve its value. Again, comprehensive testing should ensure that you've adequately addressed these complications (at least on the platforms where you've tested your code so far :-) ).

Going back to the test values I suggested (0x0101, 0x0180, 0x8001, and 0x8080), if you're transmitting unsigned integers, these correspond to 257, 384, 32769, and 32896. If you're transmitting signed integers, they correspond to 257, 384, -32767, and -32640. And if on the other end you get values like -693 or 65281 (which correspond to hexadecimal 0xff01), or if you get 32896 when you expected -32640, it indicates that you need to go back and be more careful with your signed/unsigned usage, with your masking, and/or with your explicit sign extension.

Finally, if you use the "memory" approach, and if your sending and receiving code runs on machines of different byte orders, you'll find the bytes swapped. 0x0102 will turn into 0x0201. There are various ways to solve this, but it can be quite a nuisance. (This is why, as I said, I usually prefer the "mathematical" approach, so I can just sidestep the byte order problem.)

Upvotes: 2

user2371524
user2371524

Reputation:

This isn't a "simple" task.

First of all, the data type for a byte in C is char. You probably want unsigned char here, as char can be either signed or unsigned, it's implementation-defined.

int is a signed type, which makes right-shifting it implementation-defined as well. As far as C is concerned, int must have at least 16 bits (which would be 2 bytes if char has 8 bits), but can have more. But as your question is written, you already know that int on your platform has 16 bits. Using this knowledge in your implementation means your code is specific to that platform and not portable.

As I see it, you have two options:

  1. You can work on the value of your int using masking and bit-shifting, something like:

    int foo = 42;
    unsigned char lsb = (unsigned)foo & 0xff; // mask the lower 8 bits
    unsigned char msb = (unsigned)foo >> 8;   // shift the higher 8 bits
    

    This has the advantage that you're independent of the layout of your int in memory. For reconstruction, do something like:

    int rec = (int)(((unsigned)msb << 8) | lsb );
    

    Note casting msb to unsigned here is necessary, as otherwise, it would be promoted to int (int can represent all values of an unsigned char), which could overflow when shifting by 8 places. As you already stated your int has "two bytes", this would be very likely in your case.

    The final cast to int is implementation-defined as well, but will work on your "typical" platform with 16bit int in 2's complement, if the compiler doesn't do something "strange". By checking first whether the unsigned is too large for an int (because the original int was negative), you could avoid this, e.g.

    unsigned tmp = ((unsigned)msb << 8 ) | lsb;
    int rec;
    if (tmp > INT_MAX)
    {
        tmp = ~tmp + 1; // 2's complement
        if (tmp > INT_MAX)
        {
            // only possible when implementation uses 2's complement
            // representation, and then only for INT_MIN
            rec = INT_MIN;
        }
        else
        {
            rec = tmp;
            rec = -rec;
        }
    }
    else
    {
        rec = tmp;
    }
    

    The 2's complement is fine here, because the rules for converting a negative int to unsigned are explicitly stated in the C standard.

  2. You can use the representation in memory, like:

    int foo = 42;
    unsigned char *rep = (unsigned char *)&foo;
    unsigned char first = rep[0];
    unsigned char second = rep[1];
    

    But beware whether first will be the MSB or LSB depends on the endianness used on your machine. Also, if your int contains padding bits (extremely unlikely in practice, but allowed by the C standard), you will read them as well. For reconstruction, do something like:

    int rec;
    unsigned char *recrep = (unsigned char *)&rec;
    recrep[0] = first;
    recrep[1] = second;
    

Upvotes: 7

Eric Postpischil
Eric Postpischil

Reputation: 223464

Given that an int is two bytes, and the number of bits per byte (CHAR_BIT) is eight, and two’s complement is used, an int named valor may be disassembled into endian-agnostic order with:

unsigned x;
memcpy(&x, &valor, sizeof x);
unsigned char Byte0 = x & 0xff;
unsigned char Byte1 = x >> 8;

and may be reassembled from unsigned char Byte0 and unsigned char Byte1 with:

unsigned x;
x = (unsigned) Byte1 << 8 | Byte0;
memcpy(&valor, &x, sizeof valor);

Notes:

  • int and unsigned have the same size and alignment per C 2011 (N1570) 6.2.5 6.
  • There are no padding bits for unsigned in this implementation, as C requires UINT_MAX to be at least 65535, so all 16 bits are needed for value representation.
  • int and unsigned have the same endianness per 6.2.6.2 2.
  • If the implementation is not two’s complement, values reassembled in the same implementation will restore the original values, but negative values will not be interoperable with implementations using different sign-bit semantics.

Upvotes: 1

Steve Summit
Steve Summit

Reputation: 48010

I wouldn't even write functions to do this. Both operations are straightforward applications of C's bitwise operators:

int valor = 522;
unsigned char superior = (valor >> 8) & 0xff;
unsigned char inferior = valor & 0xff;

int valorRestaurado = (superior << 8) | inferior;

Although it looks straightforward, there are always a few subtleties when writing code like this, and it's easy to get it wrong. For example, since valor is signed, shifting it right using >> is implementation-defined, although typically what that means is that it might sign extend or not, which won't end up affecting the value of the byte that & 0xff selects and assigns to superior.

Also, if either superior or inferior is defined as a signed type, there can be problems during the reconstruction. If they're smaller than int (as of course they necessarily are), they'll be immediately sign-extended to int before the rest of the reconstruction happens, demolishing the result. (That's why I explicitly declared superior and inferior as type unsigned char in my example. If your byte type is a typedef for unsigned char, that would be fine, too.)

There's also an obscure overflow possibility lurking in the subexpression superior << 8 even when superior is unsigned, although it's unlikely to cause a problem in practice. (See Eric Postpischil's comments for additional explanation.)

Upvotes: 2

i486
i486

Reputation: 6573

Simply define an union:

typedef union
{
   int           as_int;
   unsigned char as_byte[2];
} INT2BYTE;

INT2BYTE i2b;

Put the integer value in i2b.as_int member and get byte equivalent from i2b.as_byte[0] and i2b.as_byte[1].

Upvotes: 0

Mochuelo
Mochuelo

Reputation: 76

I am using int shrot instead of int to dry, because on the PC the int are 4 bytes and on my target platform they are 2. Use unsigned to make it easier to debug.

The code compiles with GCC (and should do it with almost any other C compiler). If Im not wrong, it depends on whether the architecture is big endian or little endian, but it would be solved by inverting the line that reconstructs the integer:

#include <stdio.h>

    void main(){
    // unsigned short int = 2 bytes in a 32 bit pc
    unsigned short int valor;
    unsigned short int reassembled;
    unsigned char data0 = 0;
    unsigned char data1 = 0;

    printf("An integer is %d bytes\n", sizeof(valor));

    printf("Enter a number: \n");
    scanf("%d",&valor);
    // Decomposes the int in 2 bytes
    data0 = (char) 0x00FF & valor;
    data1 = (char) 0x00FF & (valor >> 8);
   // Just a bit of 'feedback'
    printf("Integer: %d \n", valor);
    printf("Hexa: %X \n", valor);
    printf("Byte 0: %d - %X \n", data0, data0);
    printf("Byte 1: %d - %X \n", data1, data1);
    // Reassembles the int from 2 bytes
    reassembled = (unsigned short int) (data1 << 8 | data0);
    // Show the rebuilt number
    printf("Reassembled Integer: %d \n", reassembled);
    printf("Reassembled Hexa: %X \n", reassembled);
    return;
}

Upvotes: -1

Sourav Ghosh
Sourav Ghosh

Reputation: 134356

You can actually, cast the address of the integer variable to a character pointer (unsigned char*, to be accurate), read the value and then increment the pointer to point to the next byte to read the value again. This conforms with the aliasing rules.

Upvotes: 0

Related Questions