Reputation: 3326
I am working on converting a small program from C to Python and I'm having trouble reading the file. It is a .dat file with data in hex format. Here is the first 132 bytes that I'm trying to read
2400 0000 4c61 7a61 726f 2053 756e 6965
7200 ffff 0000 0000 7261 6a70 6f6f 7420
6279 776f 726b 2069 7363 6869 6f70 7562
6963 2073 6872 6f76 6574 6964 6520 6469
7373 7561 5275 746c 616e 642c 5665 726d
6f6e 742c 0d00 0000 7000 0000 0000 0000
0000 0000 0000 0000 4000 0000 0000 0000
ffff ffff 656e 2073 6f76 6572 6f62 6564
6965 6e74
The C code to read this opens the file in fp
and reads it like this.
TEXT_SHORT = 64;
fread(&(record->id), sizeof(int), 1, fp);
fread(&(record->name[0]), sizeof(char), TEXT_SHORT, fp);
fread(&(record->location[0]), sizeof(char), TEXT_SHORT, fp);
printf("%06d\n", record->id);
printf("%s\n", record->name);
printf("%s\n", record->location);
Then when printing the values, I get this:
36
Lazaro Sunier
Rutland,Vermont,
To convert this functionality to Python, I wrote the following code:
def read_file(file):
id = struct.unpack('i', file.read(4))[0]
name = ''.join(struct.unpack('c'*64, file.read(64)))
location = ''.join(struct.unpack('c'*64, file.read(64)))
print(id)
print(name)
print(location)
Then I get this output
36
Lazaro Sunier��rajpoot bywork ischiopubic shrovetide dissua
p@����en soverobedient
I have been struggling with this for a while, and have no idea why this is happening. Is there something that fread() does is the background that I need to implement in Python, or am I doing it wrong?
Upvotes: 1
Views: 1249
Reputation: 33076
Although you are reading a 64 byte block both in C and in Python, Python has no such thing as \x00
as string terminator. So, while a printf
in C will print until the first \0
, Python will print the whole buffer, trailing garbage included.
Just split the string at \0
and only keep the first part:
name = name.split(b"\0", 1)[0]
location = name.split(b"\0", 1)[0]
Incidentally, you can retrieve the 3 elements in a single line:
id, name, location = struct.unpack("i64s64s", file.read(132))
name = name.split(b"\0", 1)[0]
location = name.split(b"\0", 1)[0]
Upvotes: 5