Python Opening UTF-16 file read each byte

Question

I am attempting to parse a file that I believe is UTF-16 encoded (the file magic is 0xFEFF), and I can open the file just as I want with:

 f = open(file, 'rb')

But when for example I do

print f.read(40)

it prints the actual unicode strings of the file where I would like to access the hexadecimal data and read that byte-by-byte. This may be a stupid question but I haven't been able to find out how to do this.

Also, as a follow up question. Once I get this working, I would like to parse the file looking for a specific set of bytes, in this case:

0x00 00 00 43 00 00 00

And after that pattern is found, begin parsing an entry. What's the best way to accomplish this? I was thinking about using a generator to walk through each byte, and once this pattern shows up, yield the bytes until the next instance of that pattern? Is there a more efficient way to do this?

EDIT: I am using Python 2.7

PyNEwbie · Accepted Answer

shouldn't you just be able to do this

string = 'string'
>>> hex(ord(string[1]))
'0x74'

hexString = ''
with open(filename) as f:
    while True:
    #char = f.read(1)
    chars = f.read(40)
    hexString += ''.join(hex(ord(char) for char in chars)
    if not chars:
       break

Python Opening UTF-16 file read each byte

Answers (2)

Related Questions