Hyperion
Hyperion

Reputation: 2625

Python - Convert list item into unicode if item is string

I've a list that can have mixed str and unicode strings:

lst = ['string1', u'string2', 'string3', u'string4']

I need to convert every list item in unicode if the item is a str. To convert a str to unicode I use:

s = s.decode('utf-8')

The problem is that if the string is already unicode and contains a non-ascii character, if I try to decode it I get UnicodeEncodeError: 'ascii' codec can't encode character ...

so I thought something like:

lst = [i.decode('utf-8') for i in lst if isinstance(i, str)]

But this actually deletes from the list the unicode strings.

Upvotes: 1

Views: 1645

Answers (3)

Chris
Chris

Reputation: 22953

While you could use a ternary expression in your list comprehension to correctly convert elements, in my opinion it would be cleaner to extract the logic to a separate helper function:

def convert_to_unicode(s):
    """
    convert `s` to unicode. If `s` is already
    unicode, return `s` as is.
    """
    if isinstance(s, str):
        return s.decode('utf-8')
    else:
        return s

Then you can simply call the function on each element of your list:

lst = [convert_to_unicode(i) for i in lst]

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1122182

You are filtering (removing non-matching elements); you need to use a conditional expression instead:

lst = [i.decode('utf-8') if isinstance(i, str) else i for i in lst]

The <true> if <condition> else <false> expression here produces an output, always. Here that is the decoded string, or the original object unchanged if it is not a str object.

Upvotes: 4

DYZ
DYZ

Reputation: 57033

Try this:

lst = [i.decode('utf-8') if isinstance(i, str) else i for i in lst ]

Upvotes: 4

Related Questions