user2560609
user2560609

Reputation: 79

How to convert a unicode list of tuples into utf-8 with python

My function returns a tuple which is then assigned to a variable x and appended to a list.

x = (u'string1', u'string2', u'string3', u'string4')
resultsList.append(x)

The function is called multiple times and final list consists of 20 tuples.

The strings within the tuple are in unicode and I would like to convert them to utf-8.

Some of the strings include also non-ASCII characters like ö, ä, etc.

Is there a way to convert them all in one step?

Upvotes: 4

Views: 9842

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1125398

Use a nested list comprehension:

encoded = [[s.encode('utf8') for s in t] for t in resultsList]

This produces a list of lists containing byte strings of UTF-8 encoded data.

If you were to print these lists, you'll see Python represent the contents of the Python byte strings as Python literal strings; with quotes and with any bytes that aro not printable ASCII codepoints represented with escape sequences:

>>> l = ['Kaiserstra\xc3\x9fe']
>>> l
['Kaiserstra\xc3\x9fe']
>>> l[0]
'Kaiserstra\xc3\x9fe'
>>> print l[0]
Kaiserstraße

This is normal as Python presents this data for debugging purposes. The \xc3 and \x9f escape sequences represent the two UTF-8 bytes C39F (hexadecimal) that are used to encode the small ringel-es character.

Upvotes: 10

Related Questions