Reputation: 2625
I've a list that can have mixed str and unicode strings:
lst = ['string1', u'string2', 'string3', u'string4']
I need to convert every list item in unicode if the item is a str. To convert a str to unicode I use:
s = s.decode('utf-8')
The problem is that if the string is already unicode and contains a non-ascii character, if I try to decode it I get UnicodeEncodeError: 'ascii' codec can't encode character ...
so I thought something like:
lst = [i.decode('utf-8') for i in lst if isinstance(i, str)]
But this actually deletes from the list the unicode strings.
Upvotes: 1
Views: 1645
Reputation: 22953
While you could use a ternary expression in your list comprehension to correctly convert elements, in my opinion it would be cleaner to extract the logic to a separate helper function:
def convert_to_unicode(s):
"""
convert `s` to unicode. If `s` is already
unicode, return `s` as is.
"""
if isinstance(s, str):
return s.decode('utf-8')
else:
return s
Then you can simply call the function on each element of your list:
lst = [convert_to_unicode(i) for i in lst]
Upvotes: 0
Reputation: 1122182
You are filtering (removing non-matching elements); you need to use a conditional expression instead:
lst = [i.decode('utf-8') if isinstance(i, str) else i for i in lst]
The <true> if <condition> else <false>
expression here produces an output, always. Here that is the decoded string, or the original object unchanged if it is not a str
object.
Upvotes: 4
Reputation: 57033
Try this:
lst = [i.decode('utf-8') if isinstance(i, str) else i for i in lst ]
Upvotes: 4