How to get Python to recognize German symbols, like umlaut?

Question

I'm reading my dataframe from a CSV file, using pd.read_csv()

The \x9f should be an umlaut:

'Heiner Dr\x9fke "Weil, Gotshal & Manges"'

I tried to no avail:

person1.encode('utf-8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 9: ordinal not in range(128)

TRIED

I get this when i use macroman person1.decode('macroman')
Out[511]:
u'Heiner Dr\xfcke "Weil, Gotshal & Manges"'

However, when I print person1.decode('macroman') does print out the umlaut. How do I capture this into a string?

person1.decode("cp1251")
Out[512]:
u'Heiner Dr\u045fke "Weil, Gotshal & Manges"'

Joran Beasley · Accepted Answer

somehow you are encoded to macroman ... you shouldnt be

>>> print 'Heiner Dr\x9fke "Weil, Gotshal & Mages"'.decode("macroman")
Heiner Drüke "Weil, Gotshal & Mages"

this will decode it to unicode that python understands ...

if you want to encode it for a google search

'Heiner Dr\x9fke "Weil, Gotshal & Mages"'.decode("macroman").encode('ascii', 'xmlcharrefreplace')

should work fine

Answers (2)