DJMcCarthy12
DJMcCarthy12

Reputation: 4119

Python Opening UTF-16 file read each byte

I am attempting to parse a file that I believe is UTF-16 encoded (the file magic is 0xFEFF), and I can open the file just as I want with:

 f = open(file, 'rb')

But when for example I do

print f.read(40)

it prints the actual unicode strings of the file where I would like to access the hexadecimal data and read that byte-by-byte. This may be a stupid question but I haven't been able to find out how to do this.

Also, as a follow up question. Once I get this working, I would like to parse the file looking for a specific set of bytes, in this case:

0x00 00 00 43 00 00 00

And after that pattern is found, begin parsing an entry. What's the best way to accomplish this? I was thinking about using a generator to walk through each byte, and once this pattern shows up, yield the bytes until the next instance of that pattern? Is there a more efficient way to do this?

EDIT: I am using Python 2.7

Upvotes: 3

Views: 567

Answers (2)

Kevin
Kevin

Reputation: 30181

If you want a string of hexadecimal, you can pass it through binascii.hexlify():

with open(filename, 'rb') as f:
    raw = f.read(40)
    hexadecimal = binascii.hexlify(raw)
    print(hexadecimal)

(This also works without modification on Python 3)

If you need the numerical value of each byte, you can call ord() on each element, or equivalently, map() the function over the string:

with open(filename, 'rb') as f:
    raw = f.read(40)
    byte_list = map(ord, raw)
    print byte_list

(This doesn't work on Python 3, but on 3.x, you can just iterate over raw directly)

Upvotes: 1

PyNEwbie
PyNEwbie

Reputation: 4950

shouldn't you just be able to do this

string = 'string'
>>> hex(ord(string[1]))
'0x74'

hexString = ''
with open(filename) as f:
    while True:
    #char = f.read(1)
    chars = f.read(40)
    hexString += ''.join(hex(ord(char) for char in chars)
    if not chars:
       break

Upvotes: 1

Related Questions