Reputation: 165
I'm trying to read a big binary file made of 30e6 positions, each with 195 doubles. Since the file is too big to read all into memory, I'm reading it by chunks of 10000 positions. I then do some calculation with it and read the next chunk....
Since I need random access to the file, I've written a function to read a given chunk (unsigned int chunk) from the file and store it in **chunk_data. the function returns the total number of positions read.
unsigned int read_chunk(double **chunk_data, unsigned int chunk) {
FILE *in_glf_fh;
unsigned int total_bytes_read = 0;
// Define chunk start and end positions
unsigned int start_pos = chunk * 10000;
unsigned int end_pos = start_pos + 10000 - 1;
unsigned int chunk_size = end_pos - start_pos + 1;
// Open input file
in_glf_fh = fopen(in_glf, "rb");
if( in_glf_fh == NULL )
error("ERROR: cannot open file!");
// Search start position
if( fseek(in_glf_fh, start_pos * 195 * sizeof(double), SEEK_SET) != 0 )
error("ERROR: cannot seek file!");
// Read data from file
for(unsigned int c = 0; c < chunk_size; c++) {
unsigned int bytes_read = fread ( (void*) chunk_data[c], sizeof(double), 195, in_glf_fh);
if( bytes_read != 195 && !feof(in_glf_fh) )
error("ERROR: cannot read file!");
total_bytes_read += bytes_read;
}
fclose(in_glf_fh);
return( total_bytes_read/195 );
}
The problem is, after reading some chunks, fread()
starts giving the wrong values!
Also, depending on the chunk size, the positions where fread()
starts behaving strangely differs:
chunk of 1 pos, wrong at chunk 22025475
chunk of 10000 pos, wrong at chunk 2203
chunk of 100000 pos, wrong at chunk 221
Anyone has any idea of what might be going on?
Upvotes: 2
Views: 1044
Reputation: 57774
After determining that 30e6 positions
was not hex, and instead 30,000,000: consider the problem of fseek()
: The file has 46,800,000,000 bytes. The plain vanilla fseek()
(on 16- and 32-bit platforms) is limited to the first 2^32-1 bytes (=4,294,967,295).
Depending on the platform the program runs on, you might have to use lseek64
or its equivalent. On Linux, there are
using lseek()
with
#define _FILE_OFFSET_BITS 64
llseek()
lseek64()
Upvotes: 5