calypso zhang
calypso zhang

Reputation: 21

how to access part of a encoded (gb18020) string in python

I am encoding Chinese characters using gb18030 in python. I want to access part of the encoded string. For example, the string for 李 is: '\xc0\xee'. I want to extract 'c0' and 'ee' out of this. However, python is not treating '\xc0\xee' as a 8 character string, but as a 2 character string. How I do turn it into a 8 character string so that I could access the individual roman letters in it?

Upvotes: 1

Views: 21

Answers (2)

briancaffey
briancaffey

Reputation: 2558

How about this:

li = "李"
values = str(li.encode('gb18030'))
values = [i.strip("'") for i in values.split("\\x")[1:]]

print(values)
['c0', 'ee']

How do you use repr() to get the values you are looking for?

Upvotes: 0

calypso zhang
calypso zhang

Reputation: 21

Found the solution. repr() will do.

Upvotes: 1

Related Questions