Temitope.A
Temitope.A

Reputation: 123

C - binary reading, fread is inverting the order

fread(cur, 2, 1, fin)

I am sure I will feel stupid when I get an answer to this, but what is happening?

cur is a pointer to a code_cur, a short (2 bytes), fin is a stream open for binary reading.

If my file is 00101000 01000000

what I get in the end is

code_cur = 01000000 00101000

Why is that? I am not putting any contest yet because the problem really boils down to this (at least for me) unexpected behaviour.

And, in case this is the norma, how can I obtain the desired effect?

P.S.

I should probably add that, in order to 'view' the bytes, I am printing their integer value.

printf("%d\n",code_cur)

I tried it a couple times and it seemed reliable.

Upvotes: 6

Views: 1375

Answers (3)

Guilherme Rossato
Guilherme Rossato

Reputation: 711

As others have pointed out, this is an endianess issue.

The Most Significant Byte differs in your file and your machine. Your file has big-endian (MSB first) and your machine is little-endian (MSB last or LSB first).

To understand what's happening, let's create a file with some binary data:

    uint8_t buffer[2] = {0x28, 0x40}; // hexadecimal for 00101000 01000000
    FILE * fp = fopen("file.bin", "wb"); // opens or creates file as binary
    fwrite(buffer, 1, 2, fp); // write two bytes to file
    fclose(fp);

The file.bin was created and holds the binary value 00101000 01000000, let's read it:

    uint8_t buffer[2] = {0, 0};
    FILE * fp = fopen("file.bin", "rb");
    fread(buffer, 1, 2, fp); // read two bytes from file
    fclose(fp);
    printf("0x%02x, 0x%02x\n", buffer[0], buffer[1]);
    // The above prints 0x28, 0x40, as expected and in the order we wrote previously

So everything works well because we are reading byte-by-byte and bytes don't have endianess (technically they do, they are always Most Significant Bit first regardless of your machine, but you may think as if they didn't to simplify the understanding).

Anyways, as you noticed, here's what happens when you try to read the short directly:

    FILE * fp_alt = fopen("file.bin", "rb");
    short incorrect_short = 0;
    fread(&incorrect_short, 1, 2, fp_alt);
    fclose(fp_alt);
    printf("Read short as machine endianess: %hu\n", incorrect_short);
    printf("In hex, that is 0x%04x\n", incorrect_short);
    // We get the incorrect decimal of 16424 and hex of 0x4028!
    // The machine inverted our short because of the way the endianess works internally

The worst part is that if you're using a big-endian machine, the above results would not return incorrect number leaving you unaware that your code is endian-specific and not portable between processors!

It's nice to use ntohs from arpa/inet.h to convert the endianess, but I find it strange since it's a whole (non-standard) library made for network communication to solve an issue that comes from reading files, and it solves it by reading it incorrectly from the file and then 'translating' the incorrect value instead of just reading it correctly.

In higher languages we often see functions to handle reading endianess from file instead of converting the value because we (usually) know how a file structure is and its endianess, just look at Javascript Buffer's readInt16BE method, straight to the point and easy to use.

Motivated by this simplicity, I created a function that reads a 16-bit integer below (but it's very easy to change to 8, 32 or 64 bits if you need to):

#include <stdint.h> // necessary for specific int types

// Advances and reads a single signed 16-bit integer from the file descriptor as Big Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16BE(int16_t * result, FILE * f) {
    uint8_t buffer[sizeof(int16_t)];
    if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
        return 0;
    *result = buffer[0] << 8 + buffer[1];
    return 1;
}

Usage is simple (error handling omitted for brevity):

    FILE * fp = fopen("file.bin", "rb"); // Open file as binary
    short code_cur = 0;
    freadInt16BE(&code_cur, fp);
    fclose(fp);
    printf("Read Big-Endian (MSB first) short: %hu\n", code_cur);
    printf("In hex, that is 0x%04x\n", code_cur);
    // The above code prints 0x2840 correctly (decimal: 10304)

The function will fail (return 0) if the file either: doesn't exist, can't be open, or did not contain the 2 bytes to be read at the current position.

As a bonus, if you happen to find a file that is little-endian, you can use this function:

// Advances and reads a single signed 16-bit integer from the file descriptor as Little Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16LE(int16_t * result, FILE * f) {
    uint8_t buffer[sizeof(int16_t)];
    if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
        return 0;
    *result = buffer[1] << 8 + buffer[0];
    return 1;
}

Upvotes: 0

Toni Homedes i Saun
Toni Homedes i Saun

Reputation: 716

As others have pointed out you need to learn more on endianness.

You don't know it but your file is (luckily) in Network Byte Order (which is Big Endian). Your machine is little endian, so a correction is needed. Needed or not, this correction is always recommended as this will guarantee that your program runs everywhere.

Do somethig similar to this:

{
    uint16_t tmp;

    if (1 == fread(&tmp, 2, 1, fin)) { /* Check fread finished well */
        code_cur = ntohs(tmp);
    } else {
        /* Treat error however you see fit */
        perror("Error reading file");
        exit(EXIT_FAILURE); // requires #include <stdlib.h>
    }
}

ntohs() will convert your value from file order to your machine's order, whatever it is, big or little endian.

Upvotes: 3

Wouter Verhelst
Wouter Verhelst

Reputation: 1292

This is why htonl and htons (and friends) exist. They're not part of the C standard library, but they're available on pretty much every platform that does networking.

"htonl" means "host to network, long"; "htons" means "host to network, short". In this context, "long" means 32 bits, and "short" means 16 bits (even if the platform declares "long" to be 64 bits). Basically, whenever you read something from the "network" (or in your case, the stream you're reading from), you pass it through "ntoh*". When you're writing out, you pass it through "hton*"

You can permutate those function names in whatever way you want, except for the silly ones (no, there is no ntons, and no stonl either)

Upvotes: 1

Related Questions