joe_chip
joe_chip

Reputation: 2558

Strict aliasing and binary I/O

Let's consider the following (simplified) code for reading contents of a binary file:

struct Header
{
    char signature[8];
    uint32_t version;
    uint32_t numberOfSomeChunks;
    uint32_t numberOfSomeOtherChunks;
};

void readFile(std::istream& stream)
{
    // find total size of the file, in bytes:
    stream.seekg(0, std::ios::end);
    const std::size_t totalSize = stream.tellg();

    // allocate enough memory and read entire file
    std::unique_ptr<std::byte[]> fileBuf = std::make_unique<std::byte[]>(totalSize);
    stream.seekg(0);
    stream.read(reinterpret_cast<char*>(fileBuf.get()), totalSize);

    // get the header and do something with it:
    const Header* hdr = reinterpret_cast<const Header*>(fileBuf.get());

    if(hdr->version != expectedVersion) // <- Potential UB?
    {
        // report the error
    }

    // and so on...
}

The way I see this, the following line:

if(hdr->version != expectedVersion) // <- Potential UB?

contains undefined behavior: we're reading version member of type uint32_t which is overlaid on top of an array of std::byte objects, and compiler is free to assume that uint32_t object does not alias anything else.

The question is: is my interpretation correct? If yes, what can be done to fix this code? If no, why there's no UB here?

Note 1: I understand the purpose of the strict aliasing rule (allowing compiler to avoid unnecessary loads from memory). Also, I know that in this case using std::memcpy would be a safe solution - but using std::memcpy would mean that we have to do additional memory allocations (on stack, or on heap if size of an object is not known).

Upvotes: 1

Views: 358

Answers (2)

Language Lawyer
Language Lawyer

Reputation: 3569

what can be done to fix this code?

Wait until http://wg21.link/P0593 or something similar allowing implicit object creation in arrays of char/unsigned char/std::byte is accepted.

Upvotes: 0

eerorika
eerorika

Reputation: 238361

The question is: is my interpretation correct?

Yes.

If yes, what can be done to fix this code?

You already know that memcpy is a solution. You can however skip memcpy and extra memory allocation by reading directly onto the header object:

Header h;
stream.read(reinterpret_cast<char*>(&h), sizeof h);

Note that reading binary file this way means that the integer representation of the file must match the representation of the CPU. This means that the file is not portable to systems with differing CPU architecture.

Upvotes: 3

Related Questions