Reputation: 3096
I'm building a simpler version of xxd
for a school project and I'm getting hung up on the file output when reading binary files only (i.e. when I read plain text files, everything works as expected).
Expected output:
0000000: 504b 0304 1400 0000 0800 70b6 4746 562d PK........p.GFV-
0000010: e841 3600 0000 3f00 0000 0900 1c00 706c .A6...?.......pl
0000020: 6169 6e2e 7478 7455 5409 0003 7307 d754 ain.txtUT...s..T
0000030: ba1d d754 7578 0b00 0104 f501 0000 0414 ...Tux..........
0000040: 0000 000b c9c8 2c56 00a2 e2fc dc54 85e2 ......,V.....T..
0000050: c4dc 829c 5485 92d4 8a12 ae10 a844 625e ....T........Db^
0000060: 7e49 466a 9142 4e66 5eaa 4266 9e02 9003 ~IFj.BNf^.Bf....
0000070: 56a0 9096 9993 ca05 0050 4b01 021e 0314 V........PK.....
0000080: 0000 0008 0070 b647 4656 2de8 4136 0000 .....p.GFV-.A6..
0000090: 003f 0000 0009 0018 0000 0000 0001 0000 .?..............
00000a0: 00a4 8100 0000 0070 6c61 696e 2e74 7874 .......plain.txt
00000b0: 5554 0500 0373 07d7 5475 780b 0001 04f5 UT...s..Tux.....
00000c0: 0100 0004 1400 0000 504b 0506 0000 0000 ........PK......
00000d0: 0100 0100 4f00 0000 7900 0000 0000 ....O...y.....
Actual output:
0000000: 504b 0304 1400 0000 0800 70ffb6 4746 562d PK........p.GFV-
0000010: ffe841 3600 0000 3f00 0000 0900 1c00 706c .A6...?.......pl
0000020: 6169 6e2e 7478 7455 5409 0003 7307 ffd754 ain.txtUT...s..T
0000030: ffba1d ffd754 7578 0b00 0104 fff501 0000 0414 ...Tux..........
0000040: 0000 000b ffc9ffc8 2c56 00ffa2 ffe2fffc ffdc54 ff85ffe2 ......,V.....T..
0000050: ffc4ffdc ff82ff9c 54ff85 ff92ffd4 ff8a12 ffae10 ffa844 625e ....T........Db^
0000060: 7e49 466a ff9142 4e66 5effaa 4266 ff9e02 ff9003 ~IFj.BNf^.Bf....
0000070: 56ffa0 ff90ff96 ff99ff93 ffca05 0050 4b01 021e 0314 V........PK.....
0000080: 0000 0008 0070 ffb647 4656 2dffe8 4136 0000 .....p.GFV-.A6..
0000090: 003f 0000 0009 0018 0000 0000 0001 0000 .?..............
00000a0: 00ffa4 ff8100 0000 0070 6c61 696e 2e74 7874 .......plain.txt
00000b0: 5554 0500 0373 07ffd7 5475 780b 0001 04fff5 UT...s..Tux.....
00000c0: 0100 0004 1400 0000 504b 0506 0000 0000 ........PK......
00000d0: 0100 0100 4f00 0000 7900 0000 0000 0000 ....O...y.......
Here's a quick diff of the two files for easy reference.
I have a feeling it's the way I'm reading the files. I decided to stick with the C++ libraries and use std::ifstream
to read the files. Here's my implementation:
void DumpUtility::dump(const char* filename) {
std::ifstream file(filename, std::ifstream::in|std::ifstream::binary); // open file for reading
if(file.is_open()) { // ensure file is open and ready to go
std::cout << std::hex << std::setfill('0'); // pad PC with leading zeros
char buffer[this->bytesPerLine]; // buffer symbols
while(file.good()) {
file.read(buffer, this->bytesPerLine);
std::cout << std::setw(7) << this->pc << ":";
for(unsigned int i = 0; i < this->bytesPerLine; i++) {
if(i % 2 == 0) std::cout << " ";
std::cout << std::setw(2) << (unsigned short)buffer[i];
}
std::cout << " ";
for(unsigned int i = 0; i < this->bytesPerLine; i++) {
if(isprint(buffer[i]) == 0) { // checks if character is printable
std::cout << ".";
} else {
std::cout << buffer[i];
}
}
std::cout << std::endl;
this->pc += this->bytesPerLine;
}
} else {
std::cerr << "Couldn't open file. General error..." << std::endl;
exit(EXIT_FAILURE);
}
file.close();
}
So, file.read(buffer, this->bytesPerLine);
is the line that reads the file and I format the data has hex via iomanip
. I have also tried using printf(%02X, (unsigned short)buffer[i]);
with no luck – same output.
clang++ -O0 -g -Wall -c
g++ -g -Wall -c
g++
lldb
and gdb
in order to see where exactly these extra F's come from, I found nothing.It seems like std::ifstream::read()
is doing something other than simply storing the special characters as they are. Does anyone know what these extra F's
represent and can anyone point me in the right direction to resolve this?
Note: I'm trying to understand how to do this using std::ifstream
as opposed to using cstdio
. If worst comes to worst then I'll implement the method using the File IO utilities in cstdio
instead. If I can't do this using ifstream
then I'll gladly take the explanation so I can learn!
Upvotes: 0
Views: 2347
Reputation: 148870
Your problem is that your char
type is signed. So when you write (unsigned short)buffer[i]
, it is translated as char -> int -> unsigned short.
if the byte is under 0x7f, it is seen as >=0 and all is fine, but if not, it is internally padded with 1 bits to form a negative int. You first problem is on a b6
. What actually happens is :
b6 (signed char) -> FFFFFFb6 (signed int) -> FFB6 (unsigned short)
Hopefully the fix is simple. You just have to write :
std::cout << std::setw(2) << (unsigned short) (unsigned char) buffer[i];
because now the conversion will correctly be :
b6 (signed char) -> b6 (unsigned char) -> b6 (signed int) -> b6 (unsigned short)
Upvotes: 2
Reputation: 5163
Try the following:
std::cout << std::setw(2) << (unsigned short)buffer[i] & 0xFF;
Or
std::cout << std::setw(2) << (unsigned short)(unsigned char)buffer[i];
Or
unsigned char buffer[this->bytesPerLine];
...
std::cout << (char)buffer[i];
Also isprint
is used incorrectly with signed char. The issue is similar: when the char value is negative, it is expanded to negative integer, and not positive integer with value over 127 (what isprint
expects).
So if you are using signed chars, the call shall be:
if(isprint((unsigned char)buffer[i]) == 0)
Upvotes: 0