Reputation: 123
fread(cur, 2, 1, fin)
I am sure I will feel stupid when I get an answer to this, but what is happening?
cur is a pointer to a code_cur, a short (2 bytes), fin is a stream open for binary reading.
If my file is 00101000 01000000
what I get in the end is
code_cur = 01000000 00101000
Why is that? I am not putting any contest yet because the problem really boils down to this (at least for me) unexpected behaviour.
And, in case this is the norma, how can I obtain the desired effect?
P.S.
I should probably add that, in order to 'view' the bytes, I am printing their integer value.
printf("%d\n",code_cur)
I tried it a couple times and it seemed reliable.
Upvotes: 6
Views: 1375
Reputation: 711
As others have pointed out, this is an endianess issue.
The Most Significant Byte differs in your file and your machine. Your file has big-endian (MSB first) and your machine is little-endian (MSB last or LSB first).
To understand what's happening, let's create a file with some binary data:
uint8_t buffer[2] = {0x28, 0x40}; // hexadecimal for 00101000 01000000
FILE * fp = fopen("file.bin", "wb"); // opens or creates file as binary
fwrite(buffer, 1, 2, fp); // write two bytes to file
fclose(fp);
The file.bin
was created and holds the binary value 00101000 01000000, let's read it:
uint8_t buffer[2] = {0, 0};
FILE * fp = fopen("file.bin", "rb");
fread(buffer, 1, 2, fp); // read two bytes from file
fclose(fp);
printf("0x%02x, 0x%02x\n", buffer[0], buffer[1]);
// The above prints 0x28, 0x40, as expected and in the order we wrote previously
So everything works well because we are reading byte-by-byte and bytes don't have endianess (technically they do, they are always Most Significant Bit first regardless of your machine, but you may think as if they didn't to simplify the understanding).
Anyways, as you noticed, here's what happens when you try to read the short directly:
FILE * fp_alt = fopen("file.bin", "rb");
short incorrect_short = 0;
fread(&incorrect_short, 1, 2, fp_alt);
fclose(fp_alt);
printf("Read short as machine endianess: %hu\n", incorrect_short);
printf("In hex, that is 0x%04x\n", incorrect_short);
// We get the incorrect decimal of 16424 and hex of 0x4028!
// The machine inverted our short because of the way the endianess works internally
The worst part is that if you're using a big-endian machine, the above results would not return incorrect number leaving you unaware that your code is endian-specific and not portable between processors!
It's nice to use ntohs
from arpa/inet.h
to convert the endianess, but I find it strange since it's a whole (non-standard) library made for network communication to solve an issue that comes from reading files, and it solves it by reading it incorrectly from the file and then 'translating' the incorrect value instead of just reading it correctly.
In higher languages we often see functions to handle reading endianess from file instead of converting the value because we (usually) know how a file structure is and its endianess, just look at Javascript Buffer's readInt16BE
method, straight to the point and easy to use.
Motivated by this simplicity, I created a function that reads a 16-bit integer below (but it's very easy to change to 8, 32 or 64 bits if you need to):
#include <stdint.h> // necessary for specific int types
// Advances and reads a single signed 16-bit integer from the file descriptor as Big Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16BE(int16_t * result, FILE * f) {
uint8_t buffer[sizeof(int16_t)];
if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
return 0;
*result = buffer[0] << 8 + buffer[1];
return 1;
}
Usage is simple (error handling omitted for brevity):
FILE * fp = fopen("file.bin", "rb"); // Open file as binary
short code_cur = 0;
freadInt16BE(&code_cur, fp);
fclose(fp);
printf("Read Big-Endian (MSB first) short: %hu\n", code_cur);
printf("In hex, that is 0x%04x\n", code_cur);
// The above code prints 0x2840 correctly (decimal: 10304)
The function will fail (return 0) if the file either: doesn't exist, can't be open, or did not contain the 2 bytes to be read at the current position.
As a bonus, if you happen to find a file that is little-endian, you can use this function:
// Advances and reads a single signed 16-bit integer from the file descriptor as Little Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16LE(int16_t * result, FILE * f) {
uint8_t buffer[sizeof(int16_t)];
if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
return 0;
*result = buffer[1] << 8 + buffer[0];
return 1;
}
Upvotes: 0
Reputation: 716
As others have pointed out you need to learn more on endianness.
You don't know it but your file is (luckily) in Network Byte Order (which is Big Endian). Your machine is little endian, so a correction is needed. Needed or not, this correction is always recommended as this will guarantee that your program runs everywhere.
Do somethig similar to this:
{
uint16_t tmp;
if (1 == fread(&tmp, 2, 1, fin)) { /* Check fread finished well */
code_cur = ntohs(tmp);
} else {
/* Treat error however you see fit */
perror("Error reading file");
exit(EXIT_FAILURE); // requires #include <stdlib.h>
}
}
ntohs()
will convert your value from file order to your machine's order, whatever it is, big or little endian.
Upvotes: 3
Reputation: 1292
This is why htonl and htons (and friends) exist. They're not part of the C standard library, but they're available on pretty much every platform that does networking.
"htonl" means "host to network, long"; "htons" means "host to network, short". In this context, "long" means 32 bits, and "short" means 16 bits (even if the platform declares "long" to be 64 bits). Basically, whenever you read something from the "network" (or in your case, the stream you're reading from), you pass it through "ntoh*". When you're writing out, you pass it through "hton*"
You can permutate those function names in whatever way you want, except for the silly ones (no, there is no ntons, and no stonl either)
Upvotes: 1