user3314418
user3314418

Reputation: 3041

How to get Python to recognize German symbols, like umlaut?

I was reading this: python: open and read a file containing germanic umlaut as unicode

I'm reading my dataframe from a CSV file, using pd.read_csv()

The \x9f should be an umlaut:

'Heiner Dr\x9fke "Weil, Gotshal & Manges"'

I tried to no avail:

person1.encode('utf-8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 9: ordinal not in range(128)

TRIED

I get this when i use macroman person1.decode('macroman')
Out[511]:
u'Heiner Dr\xfcke "Weil, Gotshal & Manges"'

However, when I print person1.decode('macroman') does print out the umlaut. How do I capture this into a string?

person1.decode("cp1251")
Out[512]:
u'Heiner Dr\u045fke "Weil, Gotshal & Manges"'

Upvotes: 0

Views: 3307

Answers (2)

Rachel Gallen
Rachel Gallen

Reputation: 28573

u = u"profileDir_(\u00fc)" (u umlaut) according to this reference

Upvotes: 1

Joran Beasley
Joran Beasley

Reputation: 114038

somehow you are encoded to macroman ... you shouldnt be

>>> print 'Heiner Dr\x9fke "Weil, Gotshal & Mages"'.decode("macroman")
Heiner Drüke "Weil, Gotshal & Mages"

this will decode it to unicode that python understands ...

if you want to encode it for a google search

'Heiner Dr\x9fke "Weil, Gotshal & Mages"'.decode("macroman").encode('ascii', 'xmlcharrefreplace')

should work fine

Upvotes: 4

Related Questions