Adrian
Adrian

Reputation: 1127

How to handle binary files in a portable way using std::fstream?

The put/get methods of std::fstream classes operate on char arguments rather than ints. Is there a portable way of representing these char-bytes as integers ? (My naive expectation is that a binary file is a sequence of bytes, i.e. a sequence of integers).

To make this question more concrete, consider the following two functions:

void print_binary_file_to_cout( const std::string &filename)
{
    std::ifstream ifs(filename, std::ios_base::binary|std::ios_base::in);
    char c; 
    while(ifs.get(c))
        std::cout << static_cast<int>(c) << std::endl;
}

and

void make_binary_file_from_cin( const std::string &filename)
{ 
    std::ofstream ofs(filename, std::ios_base::binary|std::ios_base::out);
    const int no_char = 256;
    int cInt = no_char; 
    while(std::cin>>cInt && cInt!=no_char )
        ofs.put( static_cast<char>( cInt ) );
}

Now, suppose that one function is compiled on Windows in Visual Studio, and the other in gcc on Linux. If the output of print...() is given as the input to make...() will the original file be reproduced?

I guess not, so I'm asking how to correctly implement this idea, i.e. how to get a portable (and human-understandable) representation of bytes in binary files?

Upvotes: 1

Views: 230

Answers (2)

Gem Taylor
Gem Taylor

Reputation: 5613

There is a lot of code out there tat presumes the char functions will work correctly with unsigned char variables, perhaps with a static_cast, that the forms are bit identical, but the language lawyers will say that assumption can't be relied on if you are writing "perfect" portable code.

Luckily, reinterpret_cast does offer the facility to cast any pointer into a pointer to signed or unsigned char, and that is the easiest get-out.

Two notes top consider for all binary files:

On windows the file must be opened in binary mode, otherwise any bytes with code 13 will mysteriously disappear.

To store numbers larger than 256 you will need to span together a number of byte values. You need to decide the convention for doing this: wether the first byte is the least or most significant part of the value. Certain archetectures (arm native and 68K) use the "big end" model, where the most significant byte is first, while intel (and arm in switched mode) use a "little end" model. If you are reading byte by byte you just have to specify it.

Upvotes: 1

ypnos
ypnos

Reputation: 52367

The most common human-readable representation of bytes is in hex (base 16) notation. You can tell iostreams to use hex format by passing std::hex into the stream. std::hex modifies the streams behavior accordingly both for input and output streams. This format is also canonical to work independent of compilers and platforms, and you do not need to use a separator (like newline) between values. As a stop value, you can use any character outside [0-9a-fA-F].

Note that you should use unsigned chars.

Upvotes: 1

Related Questions