Juanjo Conti
Juanjo Conti

Reputation: 30013

Replace non-ascii chars from a unicode string in Python

How can I replace non-ascii chars from a unicode string in Python?

This are the output I spect for the given inputs:

música -> musica

cartón -> carton

caño -> cano

Myaybe with a dict where 'á' is a key and 'a' a value?

Upvotes: 12

Views: 8696

Answers (2)

fiacobelli
fiacobelli

Reputation: 1990

Now, just to supplement that answer: It may be the case that your data does not come in unicode (i.e. you are reading a file with another encoding and you cannot prefix the string with a "u"). Here's a snippet that may work too (mostly for those reading files in English).

import unicodedata
unicodedata.normalize('NFKD',unicode(someString,"ISO-8859-1")).encode("ascii","ignore")

Upvotes: 7

llasram
llasram

Reputation: 4475

If all you want to do is degrade accented characters to their non-accented equivalent:

>>> import unicodedata
>>> unicodedata.normalize('NFKD', u"m\u00fasica").encode('ascii', 'ignore')
'musica'

Upvotes: 21

Related Questions