Peter Ragndahl
Peter Ragndahl

Reputation: 43

Parsing unicode characters without backslash using python

Is there an easy way of parsing unicode characters like this u00e4 that does not have any backslash character using python3. I would like to replace the unicode sequence with the correct character instead. I have a text like the one below.

Hju00e4lper dig, Tru00e4ffa lu00e4kare, sjuksku00f6terskor och psykologer mm

I can of course use some kind of regex matching but are there an easier way of doing it using python3?

Upvotes: 0

Views: 152

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177901

Using re.sub with a function to convert the digits to a character:

>>> import re
>>> s='Hju00e4lper dig, Tru00e4ffa lu00e4kare, sjuksku00f6terskor och psykologer mm'
>>> re.sub('u([0-9a-f]{4})',lambda m: chr(int(m.group(1),16)),s)
'Hjälper dig, Träffa läkare, sjuksköterskor och psykologer mm'

Upvotes: 1

Related Questions