Frank Escobar
Frank Escobar

Reputation: 376

Issues reading a binary format [C++]

I need to parse a file with this specifications:

V-Ray mesh format description (.vrmesh)

Parsing the first two values is done ok right now by using:

    char id[7];
    fgets( id, 7, file );
    uint32_t fileversion;
    char  bytesFileversion[4];
    fgets( bytesFileversion, 4, file );
    fileversion = bytesFileversion[3] | (bytesFileversion[2] << 8) | (bytesFileversion[1] << 16) | (bytesFileversion[0] << 24);

both id and fileversion can be printed ok.

Unfortunatly the next field is a long long int / uint64_t and I tried to read it in several ways like:

    uint64_t lookUpTable;
    char  bytes[8];
    fgets( bytes, 8, file );
    //It should work... but it acually doesn't work
    lookUpTable =    static_cast<uint64_t> (bytes[7]) |
                    (static_cast<uint64_t> (bytes[6]) << 8)  |
                    (static_cast<uint64_t> (bytes[5]) << 16) |
                    (static_cast<uint64_t> (bytes[4]) << 24) |
                    (static_cast<uint64_t> (bytes[3]) << 32) |
                    (static_cast<uint64_t> (bytes[2]) << 40) |
                    (static_cast<uint64_t> (bytes[1]) << 48) |
                    (static_cast<uint64_t> (bytes[0]) << 56);

But it didn't work, also I've found an implementation that parses the file ok in Python https://github.com/bdancer/vray_tools/blob/master/VRayProxy.py and I cannot figure why it works and my implementation does not.

I tried to modify that py file to read/print two uint_32

    self.lookupOffset = self.binRead("I", 4)[0]
    self.report("  lookupOffseta = %i" % self.lookupOffset)
    self.lookupOffset = self.binRead("I", 4)[0]
    self.report("  lookupOffsetb = %i" % self.lookupOffset)

and surprisingly I didn't get the same result than doing it in C++ with

    uint32_t lookUpTableA;
    char  bytesLookUpTableA[4];
    fgets( bytesLookUpTableA, 4, file );
    lookUpTableA = bytesLookUpTableA[3] | (bytesLookUpTableA[2] << 8) | (bytesLookUpTableA[1] << 16) | (bytesLookUpTableA[0] << 24);
    uint32_t lookUpTableB;
    char  bytesLookUpTableB[4];
    fgets( bytesLookUpTableB, 4, file );
    lookUpTableB = bytesLookUpTableB[3] | (bytesLookUpTableB[2] << 8) | (bytesLookUpTableB[1] << 16) | (bytesLookUpTableB[0] << 24);

So it starts to seem sorcery to me.

Thank you in advance for any tip!

PS. as reference here is a binary file with this format https://wetransfer.com/downloads/f055f43fcd82aa2c212d86482d4227a220180617175809/511208757a572c984ac1ad3a07f665e720180617175809/bfbc90

Upvotes: 0

Views: 192

Answers (2)

Retired Ninja
Retired Ninja

Reputation: 4924

This seems to work correctly for me to read the sample files from http://help.chaosgroup.com/vray/help/maya/sdk22/vrmesh_format.html except for armadillo.vrmesh as it uses a different id string. The version is consistent, and the offset is within the file bounds and consistently near the end. I couldn't find a simple description of the rest of the file content so I didn't try decoding it further.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>

struct VMeshInfo
{
    std::string filename;
    std::string id;
    uint32_t version;
    uint64_t lookup_offset;

    bool read(const std::string& name)
    {
        filename = name;
        std::ifstream f(filename, std::ios::binary);
        if (!f)
        {
            std::cerr << "Error opening file '" << filename << "'\n";
            return false;
        }
        char buffer[8] = { 0 };
        if (!f.read(buffer, 7))
        {
            std::cerr << "Error reading id from file '" << filename << "'\n";
            return false;
        }
        id = buffer;
        if (id != "vrmesh")
        {
            std::cerr << "id != 'vrmesh' in file '" << filename << "'\n";
            return false;
        }
        if (!f.read(reinterpret_cast<char*>(&version), sizeof version))
        {
            std::cerr << "Error reading version from file '" << filename << "'\n";
            return false;
        }
        if (!f.read(reinterpret_cast<char*>(&lookup_offset), sizeof lookup_offset))
        {
            std::cerr << "Error reading lookup_offset from file '" << filename << "'\n";
            return false;
        }
        return true;
    }

    void print(std::ostream& f)
    {
        f << "filename: " << filename << "\n";
        f << "id: " << id << "\n";
        f << "version: " << version << "\n";
        f << "lookup_offset: " << lookup_offset << "\n";
        f << "====================\n";
    }
};

int main()
{
    std::vector<std::string> files{ "cube.vrmesh", "cylinder_bend.vrmesh", "objects.vrmesh" };
    for (auto& filename : files)
    {
        VMeshInfo info;
        if (info.read(filename))
        {
            info.print(std::cout);
        }
        else
        {
            std::cerr << "Error reading file info!\n";
            continue;
        }
    }
    return 0;
}

Output:

filename: cube.vrmesh
id: vrmesh
version: 4096
lookup_offset: 798
====================
filename: cylinder_bend.vrmesh
id: vrmesh
version: 4096
lookup_offset: 271397
====================
filename: objects.vrmesh
id: vrmesh
version: 4096
lookup_offset: 68051
====================

Upvotes: 1

Thomas Matthews
Thomas Matthews

Reputation: 57688

Preface

Computers can order multi-byte integers in two layouts: Big Endian and Little Endian. Big endian is most significant byte first, Little Endian is least significant byte first.
Find out which layout the binary file has and which layout your platform has. Very important research before you start coding.

Binary Reading of 64-bit integer

I recommend reading the 64-bit data directly into the 64-bit integer:

uint64_t data;
data_stream.fread((char *) &data, sizeof(data); 

If your platform and the data file have the same integer layout, your work stops here.

Swapping the Bytes

If your platform byte ordering differs from the data, then you'll have to rearrange the bytes:

uint64_t data; 
data_stream.fread((char *) &data, sizeof(data); 
uint64_t converted_data;
converted_data = (data & 0x000000000000FF) << 56
               | (data & 0x0000000000FF00) << 48
               | (data & 0x00000000FF0000) << 40
               | (data & 0x000000FF000000) << 32
               | (data & 0x0000FF00000000) << 24
               | (data & 0x00FF0000000000) << 16
               | (data & 0xFF000000000000) << 8;

In the above snippet, both converted_data and data are of the same type, so no casts are necessary. There will be no alignment issues either.

Upvotes: 2

Related Questions