Reputation: 125
I wrote the following program to translate a hexstring to their corresponding binary data.
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char bf[3];
char b; /* each byte */
bf[0] = bf[1] = bf[2] = 0;
for (;;) {
for (;;) {
bf[0] = getchar();
if (isspace(bf[0])) continue;
if (bf[0] == EOF) goto end;
break;
}
for (;;) {
bf[1] = getchar();
if (isspace(bf[1])) continue;
if (bf[1] == EOF) goto end;
break;
}
b = strtoul(bf, NULL, 16);
//printf("%s = %d\n", bf, b);
fwrite(&b, sizeof b, 1, stdout);
}
end:
exit(0);
}
Here's a test file:
%cat test.txt
E244050BF817B01D5E271F90052E0DD0
A9A5D1A2468E6908D4CF9951FC544A7B
0A5DF5692545A8856F3EF2CA5440A365
0FE4C9BC9854B042514E4805F0D0C4FF
Here's a run on a UNIX system (output perfectly as expected):
%./hex2bin < /mnt/test.txt | od -t x1
0000000 e2 44 05 0b f8 17 b0 1d 5e 27 1f 90 05 2e 0d d0
0000020 a9 a5 d1 a2 46 8e 69 08 d4 cf 99 51 fc 54 4a 7b
0000040 0a 5d f5 69 25 45 a8 85 6f 3e f2 ca 54 40 a3 65
0000060 0f e4 c9 bc 98 54 b0 42 51 4e 48 05 f0 d0 c4 ff
0000100
Here's a run on Windows system (a carriage return creeps in after byte 7b):
%./hex2bin.exe < test.txt | od -t x1
0000000 e2 44 05 0b f8 17 b0 1d 5e 27 1f 90 05 2e 0d d0
0000020 a9 a5 d1 a2 46 8e 69 08 d4 cf 99 51 fc 54 4a 7b
0000040 0d 0a 5d f5 69 25 45 a8 85 6f 3e f2 ca 54 40 a3
0000060 65 0f e4 c9 bc 98 54 b0 42 51 4e 48 05 f0 d0 c4
0000100 ff
0000101
%
The right sequence should be [...] 7b 0a [...] but it comes out as [...] 7b 0d 0a [...]. What's happening here?
Upvotes: 2
Views: 212
Reputation: 85767
Windows text files use the byte sequence 0D 0A to mark the end of a line (Unix only uses a single byte, 0A). The C standard library translates between this external encoding and the internal "virtual newline" character ('\n'
) that C uses.
That is, when a C program running on Windows writes '\n'
to a text stream, it gets translated to 0D 0A. The inverse operation happens on input. Because '\n'
is a real char
value (typically 10
), other bytes can be misinterpreted as '\n'
.
If you don't want this behavior (e.g. because you're writing or reading binary data, not text), you need to use a binary stream, not a text stream.
For normal files this is easy: Just add "b"
to the open mode when calling fopen
. For the predefined streams (stdin
/ stdout
/ stderr
) there is no portable solution as far as I'm aware, but Windows has an extra function to put an existing stream into binary mode; see e.g. this answer.
It shows what amounts to the following code (also seen in the official Microsoft documentation):
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
...
_setmode( _fileno( stdout ), _O_BINARY );
There are a few bugs in your code:
bf[0] = getchar();
if (isspace(bf[0])) continue;
if (bf[0] == EOF) goto end;
The two if
conditions are broken because bf[0]
is a char
. A char
is not big enough to store EOF
, which is a special non-character value returned by getchar
to signal an error or end-of-file. In general, getchar
will return a non-negative value for successful input and a negative value (EOF
, typically -1
) on error. By assigning this value to a char
, you're truncating EOF
and mapping it to some real character value.
The behavior of the bf[0] == EOF
check depends on whether char
is a signed type on your platform (it probably is). If so, it will confuse some other character (normally 255, which corresponds to ÿ in ISO-8859-1) for end-of-file. If char
is unsigned, this condition is never true, so you'll get an infinite loop.
Similarly, isspace(bf[0])
is broken if char
is a signed type because all the is...
functions have undefined behavior if their argument does not fit inside an unsigned char
(with one special exception: EOF
is allowed).
The fix is to store the result of getchar
in an int
first:
int c = getchar();
if (c == EOF) goto end;
if (isspace(c)) continue;
bf[0] = c;
break;
Upvotes: 6