Writing a byte to a file and then reading the same byte are not the same

Question

Basically I have a file, and in this file I am writing 3 bytes, and then I'm writing a 4 byte integer. In another application I read the first 3 bytes, and then I read the next 4 bytes and convert them to an integer.

When I print out the value, I have very different results...

fwrite(&recordNum, 2, 1, file);    //The first 2 bytes (recordNum is a short int)
fwrite(&charval, 1, 1, file);      //charval is a single byte char
fwrite(&time, 4, 1, file);
// I continue writing a total of 40 bytes

Here is how time was calculated:

time_t rawtime;
struct tm * timeinfo;

time(&rawtime);
timeinfo = localtime(&rawtime);

int time = (int)rawtime;

I have tested to see that sizeof(time) is 4 bytes, and it is. I have also tested using an epoch converter to make sure this is the correct time (in seconds) and it is.

Now, in another file I read the 40 bytes to a char buffer:

char record[40];
fread(record, 1, 40, file);

// Then I convert those 4 bytes into an uint32_t
uint32_t timestamp =(uint32_t)record[6] | (uint32_t)record[5] << 8 | (uint32_t)record[4] << 16 | (uint32_t)record[3] << 24;

printf("Testing timestamp = %d
", timestamp);

But this prints out -6624. The expected value is 551995007.

EDIT

To be clear, everything else that I am reading from the char buffer is correct. After this timestamp I have text, which I simply print and it runs fine.

little_birdie · Accepted Answer

You problem is probably right here:

uint32_t timestamp =(uint32_t)record[6] | (uint32_t)record[5] << 8 | (uint32_t)record[4] << 16 | (uint32_t)record[3] << 24;
printf("Testing timestamp = %d
", timestamp);

You've used fwrite to write out a 32 bit integer.. in whatever order the processor stored it in memory.. and you don't actually know what byte ordering (endian-ness) the machine used. Maybe the first byte written out is the lowest byte of the integer, or maybe it's the highest byte of the integer.

If you're reading and writing the data on the same machine, or on different machines with the same architecture, you don't need to care about that.. it will work. But if the data is written on an architecture with one byte ordering, and potentially read in on an architecture with another byte ordering, it will be wrong: Your code needs to know what order the bytes should be in memory and what order they will be read/written on disk.

In this case, in your code, you are doing a mix of both: You write them out in whatever endian-ness the machine uses natively.. then when you read them in, you start shifting the bits around as if you know what order they were originally in.. but you don't, because you didn't pay attention to the order when you wrote them out.

So if you're writing and reading the file on the same machine, or identical machine (same processor, OS, compiler, etc), just write them out in the native order (without worrying about what that is) and then read them back in exactly as you wrote them out. If you write them and read them on the same machine, it'll work.

So if your timestamp is located at offset 3 through 6 of your record, just do this:

uint_32t timestamp;
memcpy(×tamp, record+3, sizeof(timestamp);

Note that you cannot directly cast record+3 to a uint32_t pointer because it might violate the systems word alignment requirements.

Note also that you should probably be using time_t type to hold the timestamp, if you're on a unix-like system, that'll be the natural type supplied to hold epoch time values.

But if you are planning to move this file to another machine at any point and try to read it there, you could easily end up with your data on a system that has different endian-ness or different size for time_t. Simply writing bytes in and out of a file with no thought to the endian-ness or size of types on different operating systems is just fine for temporary files or for files which are meant to be used on one computer only and which will never be moved to other types of system.

Making data files that are portable between systems is a whole subject in itself. But the first thing you should do, if you care about that, is to look at functions htons(), ntonhs(), htonl(), ntonhl(), and their ilk.. which convert to and from the system native endian-ness to a known (big) endian-ness which is the standard for internet communications and generally used for interoperability (even though Intel processors are little-endian and dominate the market these days). These function do something similar to what you were doing with your bit-shifting but since someone else wrote it, you don't have to. It's a lot easier to use the library functions for this!

For example:

#include 
#include 

int main() {

    uint32_t x = 1234, y, z;

    // open a file for writing, convert x from native to big endian, write it.
    FILE *file = fopen("foo.txt", "w");
    z = htonl(x);
    fwrite(&z, sizeof(z), 1, file);
    fclose(file);

    file = fopen("foo.txt", "r");
    fread(&z, sizeof(z), 1, file);
    x = ntohl(z);
    fclose(file);        

    printf("%d
", x);

}

NOTE I am NOT CHECKING FOR ERRORS in this code, it is just an example.. do not use functions like fopen, fread etc without checking for errors.

By using these functions both when writing the data out to disk and when reading it back, you guarantee that the data on disk is always big-endian.. eg htonl() when on a big-endian platform does nothing, when on a little-endian platform it does the conversion from bit to little endian. And ntohl() does the opposite. So your data on disk will always be read in correctly.

Writing a byte to a file and then reading the same byte are not the same

Answers (2)

Related Questions