Reputation: 3041
I was reading this: python: open and read a file containing germanic umlaut as unicode
I'm reading my dataframe from a CSV file, using pd.read_csv()
The \x9f
should be an umlaut:
'Heiner Dr\x9fke "Weil, Gotshal & Manges"'
I tried to no avail:
person1.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 9: ordinal not in range(128)
TRIED
I get this when i use macroman person1.decode('macroman')
Out[511]:
u'Heiner Dr\xfcke "Weil, Gotshal & Manges"'
However, when I print person1.decode('macroman')
does print out the umlaut.
How do I capture this into a string?
person1.decode("cp1251")
Out[512]:
u'Heiner Dr\u045fke "Weil, Gotshal & Manges"'
Upvotes: 0
Views: 3307
Reputation: 28573
u = u"profileDir_(\u00fc)"
(u umlaut) according to this reference
Upvotes: 1
Reputation: 114038
somehow you are encoded to macroman ... you shouldnt be
>>> print 'Heiner Dr\x9fke "Weil, Gotshal & Mages"'.decode("macroman")
Heiner Drüke "Weil, Gotshal & Mages"
this will decode it to unicode that python understands ...
if you want to encode it for a google search
'Heiner Dr\x9fke "Weil, Gotshal & Mages"'.decode("macroman").encode('ascii', 'xmlcharrefreplace')
should work fine
Upvotes: 4