Jon
Jon

Reputation: 1489

How can I check for unicode or escape sequences in a string?

I have a dictionary list of words, some of the words containing sequences like so:

K\xc3\xb6LN or KöLN when displayed properly.

I'd like to purge the list of such words, such that they contain plain ascii characters only. How can I do a simple True/False check to see if a string contains such sequences?

Upvotes: 1

Views: 315

Answers (1)

johnsyweb
johnsyweb

Reputation: 141908

str.isalpha() may be of assistance here:

>>> 'KöLN'.isalpha()
False
>>> 'K\xc3\xb6LN'.isalpha()
False
>>> 'Cologne'.isalpha()
True

Filtering:

>>> [word for word in ('KöLN', 'K\xc3\xb6LN', 'Cologne') if word.isalpha()]
['Cologne']

Upvotes: 5

Related Questions