Reputation: 531
I have an output of spannkr \xc3\xa4ftig, da\xc3\x9f unser
in Python. How do I replace this with umlauts?
Upvotes: 4
Views: 13693
Reputation: 55884
The German characters are already there, but encoded as utf-8. If you want to see the umlauts etc in the interpreter then you can decode to str
:
>>> bs = b'spannkr \xc3\xa4ftig, da\xc3\x9f unser'
>>> s = bs.decode('utf-8')
>>> print(s)
spannkr äftig, daß unser
It's possible that you are dealing with a str
that somehow contains utf-8 encoded data. In this case you need to perform an extra step:
>>> s = 'spannkr \xc3\xa4ftig, da\xc3\x9f unser'
>>> bs = s.encode('raw-unicode-escape') # encode to bytes without double-encoding
>>> print(bs)
b'spannkr \xc3\xa4ftig, da\xc3\x9f unser'
>>> decoded = bs.decode('utf-8')
>>> print(decoded)
spannkr äftig, daß unser
There isn't an easy way to distinguish between incorrectly embedded spaces and the spaces between words. You would need to use some kind of spellchecker or natural language application.
Upvotes: 10