luiserta
luiserta

Reputation: 43

read different data types in line c++

I'm starting in c++ and I need to read a binary file.

I know the structure of file, i.e, each file line is composed by:

'double';'int8';'float32';'float32';'float32';'float32';'float32';'float32';'int8';'float32';'float32';'float32';'float32';'int8';'float32'

or in byte numbers:

8 1 4 4 4 4 4 4 1 4 4 4 4 1 4

I made some code but is too obsolete... Here is the code:

void test1 () {
const char *filePath = "C:\20110527_phantom19.elm2";    
double *doub;           
int *in;
float *fl;
FILE *file = NULL;     
unsigned char buffer;

if ((file = fopen(filePath, "rb")) == NULL)
    cout << "Could not open specified file" << endl;
else
    cout << "File opened successfully" << endl;

// Get the size of the file in bytes
long fileSize = getFileSize(file);
cout << "Tamanho do ficheiro: " << fileSize;
cout << "\n";
// Allocate space in the buffer for the whole file
doub = new double[1];
in = new int[1];
fl = new float[1];
// Read the file in to the buffer
//fread(fileBuf, fileSize, 1, file);

//fscanf(file, "%g %d %g", doub[0],in[0],fl[0]);

fread(doub, 8, 1, file);
//cout << doub[0]<< " ";
fseek (file ,8, SEEK_SET);
fread(&buffer,1,1,file);
//printf("%d ",buffer);
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(&buffer,1,1,file);
//printf("%d ",buffer);
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(fl,4,1,file);
//cout << fl[0]<< " ";
fread(&buffer,1,1,file);
//printf("%d ",buffer);
fread(fl,4,1,file);
//cout << fl[0]<< "\n";

cin.get();
//delete[]fileBuf;
fclose(file); 
}

How can I change this to an efficient way?

Upvotes: 1

Views: 1518

Answers (3)

James Kanze
James Kanze

Reputation: 153899

In addition to the "structure" of the file, we need to know the format of the data types involved, and what you mean by "line", if the format isn't a text format. In general, however, you will 1) have to read an appropriately sized block, and then extract each value from it, according to the specified format. For integral values, it's fairly easy to extract an unsigned integral value using shifts; for int8, in fact, you just have to read the byte. For most machines, just casting the unsigned integer into the correspondingly sized signed type will work, although this is explicitly not guaranteed; if the unsigned char is greater than CHAR_MAX, you'll have to scale it down to get the
appropriate value: something like -(UCHAR_MAX+1 - value) should do the trick (for chars—for larger types, you also have to worry about the fact that UINT_MAX+1 will overflow).

If the external format is IEEE, and that's also what your machine uses (the usual case for Windows and Unix machines, but rarely the case for mainframes), then you can read an unsigned 4 or 8 byte integer (again, using shifts), and type pun it, something like:

uint64_t
get64BitUInt( char const* buffer )
{
    return reinterpret_cast<double>(
          ((buffer[0] << 52) & 0xFF)
        | ((buffer[1] << 48) & 0xFF)
        | ((buffer[2] << 40) & 0xFF)
        | ((buffer[3] << 32) & 0xFF)
        | ((buffer[4] << 24) & 0xFF)
        | ((buffer[5] << 16) & 0xFF)
        | ((buffer[6] <<  8) & 0xFF)
        | ((buffer[7]      ) & 0xFF) );
}

double
getDouble( char const* buffer )
{
    uint64_t retval = get64BitUInt( buffer );
    return *reinterpret_cast<double*>( &retval );
}

(This corresponds the usual network byte order. If your binary format uses another convention, you'll have to adapt it. And the reinterpret_cast depends on implementation defined behavior; you may have to rewrite it as:

double
getDouble( char const* buffer )
{
    union
    {
        double          d;
        uint64_t        i;
    }               results;
    results.i = get64BitUInt( buffer );
    return results.d;
}

. Or even use memcpy to copy from a uint64_t into a double.)

If your machine doesn't use IEEE floating point, and the external format is IEEE, you'll have to pick up the 8 byte word as an 8 byte unsigned int (unsigned long long), then extract the sign, exponent and mantissa according to the IEEE format; something like the following:

double
getDouble( char const* buffer )
{
    uint64_t            tmp( get64BitUInt( buffer );
    double              f = 0.0 ;
    if ( (tmp & 0x7FFFFFFFFFFFFFFF) != 0 ) {
        f = ldexp( ((tmp & 0x000FFFFFFFFFFFFF) | 0x0010000000000000),
                   (int)((tmp & 0x7FF0000000000000) >> 52) - 1022 - 53 ) ;
    }
    if ( (tmp & 0x8000000000000000) != 0 ) {
        f = -f ;
    }
    return f;
}

Don't do this until you're sure you'll need it, however.

Upvotes: 1

Jack
Jack

Reputation: 133567

What's the problem when you can easily read whole structs with your custom format and have the fields automatically filled with correct values?

struct MyDataFormat {
  double d;
  int8 i1;
  float32 f[6];
  ..
};

MyDataFormat buffer;

fread(&buffer, sizeof(MyDataFormat), 1, file);

Upvotes: 2

Martin Beckett
Martin Beckett

Reputation: 96109

If each line is the same format I would probably read a line at a time into a buffer and then have a function that pulled that buffer apart into separate elements - easier to understand, easier to test, works with larger files and is possibly more efficent to do fewer reads.

Upvotes: 1

Related Questions