FGV
FGV

Reputation: 165

fread sometimes returns wrong values with files larger than 4GiB

I'm trying to read a big binary file made of 30e6 positions, each with 195 doubles. Since the file is too big to read all into memory, I'm reading it by chunks of 10000 positions. I then do some calculation with it and read the next chunk....

Since I need random access to the file, I've written a function to read a given chunk (unsigned int chunk) from the file and store it in **chunk_data. the function returns the total number of positions read.

unsigned int read_chunk(double **chunk_data, unsigned int chunk) {
    FILE *in_glf_fh;
    unsigned int total_bytes_read = 0;

    // Define chunk start and end positions
    unsigned int start_pos = chunk * 10000;
    unsigned int end_pos = start_pos + 10000 - 1;
    unsigned int chunk_size = end_pos - start_pos + 1;

    // Open input file
    in_glf_fh = fopen(in_glf, "rb");
    if( in_glf_fh == NULL )
        error("ERROR: cannot open file!");

    // Search start position
    if( fseek(in_glf_fh, start_pos * 195 * sizeof(double), SEEK_SET) != 0 )
        error("ERROR: cannot seek file!");

    // Read data from file
    for(unsigned int c = 0; c < chunk_size; c++) {
         unsigned int bytes_read = fread ( (void*) chunk_data[c], sizeof(double), 195, in_glf_fh);
         if( bytes_read != 195 && !feof(in_glf_fh) )
             error("ERROR: cannot read file!");
         total_bytes_read += bytes_read;
    }

    fclose(in_glf_fh);
    return( total_bytes_read/195 );
}

The problem is, after reading some chunks, fread() starts giving the wrong values! Also, depending on the chunk size, the positions where fread() starts behaving strangely differs:

chunk of 1 pos, wrong at chunk 22025475
chunk of 10000 pos, wrong at chunk 2203
chunk of 100000 pos, wrong at chunk 221

Anyone has any idea of what might be going on?

Upvotes: 2

Views: 1044

Answers (1)

wallyk
wallyk

Reputation: 57774

After determining that 30e6 positions was not hex, and instead 30,000,000: consider the problem of fseek(): The file has 46,800,000,000 bytes. The plain vanilla fseek() (on 16- and 32-bit platforms) is limited to the first 2^32-1 bytes (=4,294,967,295).

Depending on the platform the program runs on, you might have to use lseek64 or its equivalent. On Linux, there are

  • using lseek() with

    #define _FILE_OFFSET_BITS 64

  • llseek()

  • lseek64()

Upvotes: 5

Related Questions