Cristian
Cristian

Reputation: 495

Trouble decoding utf-16 string

I'm using python3.3. I've been trying to decode a certain string that looks like this:

b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xed:\xf9w\xdaH\xd2?\xcf\xbc....

keeps going on. However whenever I try to decode this string using str.decode('utf-16') I get an error saying:

'utf16' codec can't decode bytes in position 54-55: illegal UTF-16 surrogate

I'm not exactly sure how to decode this string.

Upvotes: 1

Views: 2719

Answers (1)

unutbu
unutbu

Reputation: 879471

gzipped data begins with \x1f\x8b\x08 so my guess is that your data is gzipped. Try gunzipping the data before decoding.

import io
import gzip

# this raises IOError because `buf` is incomplete. It may work if you supply the complete buf
buf = b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xed:\xf9w\xdaH\xd2?\xcf\xbc'
with gzip.GzipFile(fileobj=io.BytesIO(buf)) as f:
    content = f.read()
    print(content.decode('utf-16'))

Upvotes: 3

Related Questions