Reputation: 1127
The put/get methods of std::fstream
classes operate on char arguments rather than ints.
Is there a portable way of representing these char-bytes as integers ?
(My naive expectation is that a binary file is a sequence of bytes,
i.e. a sequence of integers).
To make this question more concrete, consider the following two functions:
void print_binary_file_to_cout( const std::string &filename)
{
std::ifstream ifs(filename, std::ios_base::binary|std::ios_base::in);
char c;
while(ifs.get(c))
std::cout << static_cast<int>(c) << std::endl;
}
and
void make_binary_file_from_cin( const std::string &filename)
{
std::ofstream ofs(filename, std::ios_base::binary|std::ios_base::out);
const int no_char = 256;
int cInt = no_char;
while(std::cin>>cInt && cInt!=no_char )
ofs.put( static_cast<char>( cInt ) );
}
Now, suppose that one function is compiled on Windows in Visual Studio, and the other in gcc on Linux. If the output of print...() is given as the input to make...() will the original file be reproduced?
I guess not, so I'm asking how to correctly implement this idea, i.e. how to get a portable (and human-understandable) representation of bytes in binary files?
Upvotes: 1
Views: 230
Reputation: 5613
There is a lot of code out there tat presumes the char functions will work correctly with unsigned char variables, perhaps with a static_cast, that the forms are bit identical, but the language lawyers will say that assumption can't be relied on if you are writing "perfect" portable code.
Luckily, reinterpret_cast does offer the facility to cast any pointer into a pointer to signed or unsigned char, and that is the easiest get-out.
Two notes top consider for all binary files:
On windows the file must be opened in binary mode, otherwise any bytes with code 13 will mysteriously disappear.
To store numbers larger than 256 you will need to span together a number of byte values. You need to decide the convention for doing this: wether the first byte is the least or most significant part of the value. Certain archetectures (arm native and 68K) use the "big end" model, where the most significant byte is first, while intel (and arm in switched mode) use a "little end" model. If you are reading byte by byte you just have to specify it.
Upvotes: 1
Reputation: 52367
The most common human-readable representation of bytes is in hex (base 16) notation. You can tell iostreams to use hex format by passing std::hex
into the stream. std::hex
modifies the streams behavior accordingly both for input and output streams. This format is also canonical to work independent of compilers and platforms, and you do not need to use a separator (like newline) between values. As a stop value, you can use any character outside [0-9a-fA-F].
Note that you should use unsigned chars.
Upvotes: 1