Reputation: 4119
I am attempting to parse a file that I believe is UTF-16 encoded (the file magic is 0xFEFF), and I can open the file just as I want with:
f = open(file, 'rb')
But when for example I do
print f.read(40)
it prints the actual unicode strings of the file where I would like to access the hexadecimal data and read that byte-by-byte. This may be a stupid question but I haven't been able to find out how to do this.
Also, as a follow up question. Once I get this working, I would like to parse the file looking for a specific set of bytes, in this case:
0x00 00 00 43 00 00 00
And after that pattern is found, begin parsing an entry. What's the best way to accomplish this? I was thinking about using a generator to walk through each byte, and once this pattern shows up, yield the bytes until the next instance of that pattern? Is there a more efficient way to do this?
EDIT: I am using Python 2.7
Upvotes: 3
Views: 567
Reputation: 30181
If you want a string of hexadecimal, you can pass it through binascii.hexlify()
:
with open(filename, 'rb') as f:
raw = f.read(40)
hexadecimal = binascii.hexlify(raw)
print(hexadecimal)
(This also works without modification on Python 3)
If you need the numerical value of each byte, you can call ord()
on each element, or equivalently, map()
the function over the string:
with open(filename, 'rb') as f:
raw = f.read(40)
byte_list = map(ord, raw)
print byte_list
(This doesn't work on Python 3, but on 3.x, you can just iterate over raw
directly)
Upvotes: 1
Reputation: 4950
shouldn't you just be able to do this
string = 'string'
>>> hex(ord(string[1]))
'0x74'
hexString = ''
with open(filename) as f:
while True:
#char = f.read(1)
chars = f.read(40)
hexString += ''.join(hex(ord(char) for char in chars)
if not chars:
break
Upvotes: 1