How do I replace \xc3 etc. with umlauts?

Question

I have an output of spannkr \xc3\xa4ftig, da\xc3\x9f unser in Python. How do I replace this with umlauts?

snakecharmerb · Accepted Answer

The German characters are already there, but encoded as utf-8. If you want to see the umlauts etc in the interpreter then you can decode to str:

>>> bs = b'spannkr \xc3\xa4ftig, da\xc3\x9f unser'
>>> s = bs.decode('utf-8')
>>> print(s)
spannkr äftig, daß unser

It's possible that you are dealing with a str that somehow contains utf-8 encoded data. In this case you need to perform an extra step:

>>> s = 'spannkr \xc3\xa4ftig, da\xc3\x9f unser'
>>> bs = s.encode('raw-unicode-escape')  # encode to bytes without double-encoding
>>> print(bs)
b'spannkr \xc3\xa4ftig, da\xc3\x9f unser' 
>>> decoded = bs.decode('utf-8')
>>> print(decoded)
spannkr äftig, daß unser

There isn't an easy way to distinguish between incorrectly embedded spaces and the spaces between words. You would need to use some kind of spellchecker or natural language application.

How do I replace \xc3 etc. with umlauts?

Answers (1)

Related Questions