Tiger1
Tiger1

Reputation: 1377

How to decode strings saved in utf-8 format

I'm trying to decode the strings in the list below. They were all encoded in utf-8 format.

_strs=['."\n\nThe vicar\'',':--\n\nIn the', 'cathedral']

Expected output:

['.The vicar', ':--In the', 'cathedral']

My attempts

>>> for x in _str:
    x.decode('string_escape')
    print x


'."\n\nThe vicar\''
."

The vicar'
':--\n\nIn the'
:--

In the
'cathedral'
cathedral
>>> print [x.decode('string_escape') for x in _str]
['."\n\nThe vicar\'', ':--\n\nIn the', 'cathedral']

Both attempts failed. Any ideas?

Upvotes: 0

Views: 66

Answers (1)

Ammar
Ammar

Reputation: 1314

So you want to remove some characters from your list, it can be done using a simple regex like in the following:

import re
print [re.sub(r'[."\'\n]','',x) for x in _str]

this regex removes all the (., ", ', \n) and the result will be:

['The vicar', ':--In the', 'cathedral']

hope this helps.

Upvotes: 1

Related Questions