Reputation: 43
Is there an easy way of parsing unicode characters like this u00e4 that does not have any backslash character using python3. I would like to replace the unicode sequence with the correct character instead. I have a text like the one below.
Hju00e4lper dig, Tru00e4ffa lu00e4kare, sjuksku00f6terskor och psykologer mm
I can of course use some kind of regex matching but are there an easier way of doing it using python3?
Upvotes: 0
Views: 152
Reputation: 177901
Using re.sub
with a function to convert the digits to a character:
>>> import re
>>> s='Hju00e4lper dig, Tru00e4ffa lu00e4kare, sjuksku00f6terskor och psykologer mm'
>>> re.sub('u([0-9a-f]{4})',lambda m: chr(int(m.group(1),16)),s)
'Hjälper dig, Träffa läkare, sjuksköterskor och psykologer mm'
Upvotes: 1