Reputation: 43
I'm starting in c++ and I need to read a binary file.
I know the structure of file, i.e, each file line is composed by:
'double';'int8';'float32';'float32';'float32';'float32';'float32';'float32';'int8';'float32';'float32';'float32';'float32';'int8';'float32'
or in byte numbers:
8 1 4 4 4 4 4 4 1 4 4 4 4 1 4
I made some code but is too obsolete... Here is the code:
void test1 () {
const char *filePath = "C:\20110527_phantom19.elm2";
double *doub;
int *in;
float *fl;
FILE *file = NULL;
unsigned char buffer;
if ((file = fopen(filePath, "rb")) == NULL)
cout << "Could not open specified file" << endl;
else
cout << "File opened successfully" << endl;
// Get the size of the file in bytes
long fileSize = getFileSize(file);
cout << "Tamanho do ficheiro: " << fileSize;
cout << "\n";
// Allocate space in the buffer for the whole file
doub = new double[1];
in = new int[1];
fl = new float[1];
// Read the file in to the buffer
//fread(fileBuf, fileSize, 1, file);
//fscanf(file, "%g %d %g", doub[0],in[0],fl[0]);
fread(doub, 8, 1, file);
//cout << doub[0]<< " ";
fseek (file ,8, SEEK_SET);
fread(&buffer,1,1,file);
//printf("%d ",buffer);
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(&buffer,1,1,file);
//printf("%d ",buffer);
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(&buffer,1,1,file);
//printf("%d ",buffer);
fread(fl,4,1,file);
//cout << fl[0]<< "\n";
cin.get();
//delete[]fileBuf;
fclose(file);
}
How can I change this to an efficient way?
Upvotes: 1
Views: 1518
Reputation: 153899
In addition to the "structure" of the file, we need to know the format
of the data types involved, and what you mean by "line", if the format
isn't a text format. In general, however, you will 1) have to read an
appropriately sized block, and then extract each value from it,
according to the specified format. For integral values, it's fairly
easy to extract an unsigned integral value using shifts; for int8
, in
fact, you just have to read the byte. For most machines, just casting
the unsigned integer into the correspondingly sized signed type will
work, although this is explicitly not guaranteed; if the unsigned char
is greater than CHAR_MAX
, you'll have to scale it down to get the
appropriate value: something like -(UCHAR_MAX+1 - value)
should do the
trick (for char
s—for larger types, you also have to worry about
the fact that UINT_MAX+1
will overflow).
If the external format is IEEE, and that's also what your machine uses (the usual case for Windows and Unix machines, but rarely the case for mainframes), then you can read an unsigned 4 or 8 byte integer (again, using shifts), and type pun it, something like:
uint64_t
get64BitUInt( char const* buffer )
{
return reinterpret_cast<double>(
((buffer[0] << 52) & 0xFF)
| ((buffer[1] << 48) & 0xFF)
| ((buffer[2] << 40) & 0xFF)
| ((buffer[3] << 32) & 0xFF)
| ((buffer[4] << 24) & 0xFF)
| ((buffer[5] << 16) & 0xFF)
| ((buffer[6] << 8) & 0xFF)
| ((buffer[7] ) & 0xFF) );
}
double
getDouble( char const* buffer )
{
uint64_t retval = get64BitUInt( buffer );
return *reinterpret_cast<double*>( &retval );
}
(This corresponds the usual network byte order. If your binary format
uses another convention, you'll have to adapt it. And the
reinterpret_cast
depends on implementation defined behavior; you may
have to rewrite it as:
double
getDouble( char const* buffer )
{
union
{
double d;
uint64_t i;
} results;
results.i = get64BitUInt( buffer );
return results.d;
}
. Or even use memcpy
to copy from a uint64_t
into a double
.)
If your machine doesn't use IEEE floating point, and the external format
is IEEE, you'll have to pick up the 8 byte word as an 8 byte unsigned
int (unsigned long long
), then extract the sign, exponent and mantissa
according to the IEEE format; something like the following:
double
getDouble( char const* buffer )
{
uint64_t tmp( get64BitUInt( buffer );
double f = 0.0 ;
if ( (tmp & 0x7FFFFFFFFFFFFFFF) != 0 ) {
f = ldexp( ((tmp & 0x000FFFFFFFFFFFFF) | 0x0010000000000000),
(int)((tmp & 0x7FF0000000000000) >> 52) - 1022 - 53 ) ;
}
if ( (tmp & 0x8000000000000000) != 0 ) {
f = -f ;
}
return f;
}
Don't do this until you're sure you'll need it, however.
Upvotes: 1
Reputation: 133567
What's the problem when you can easily read whole structs with your custom format and have the fields automatically filled with correct values?
struct MyDataFormat {
double d;
int8 i1;
float32 f[6];
..
};
MyDataFormat buffer;
fread(&buffer, sizeof(MyDataFormat), 1, file);
Upvotes: 2
Reputation: 96109
If each line is the same format I would probably read a line at a time into a buffer and then have a function that pulled that buffer apart into separate elements - easier to understand, easier to test, works with larger files and is possibly more efficent to do fewer reads.
Upvotes: 1