Reputation: 2609
I have a pandas Data Frame from a Excel File as Input in my program.
I would like to replace some non ASCII characters in the pandas Data Frame.
import pandas as pd
XList=['Meßi','Ürik']
YList=['01.01.1970','01.01.1990']
df = pd.DataFrame({'X':XList,
'Y':YList})
X Y
0 Meßi 01.01.1970
1 Ürik 01.01.1990
I would like to create some replace rules: eg. ß->ss and Ü->UE
and get this:
X Y
0 Messi 01.01.1970
1 UErik 01.01.1990
Note: Im using Python 2.7
UPDATE:
Solved using the answer below and setting up by Eclipse following:
1°: Changing Text file encoding in Eclipe to UTF-8.
How to: How to use Special Chars in Java/Eclipse
2°: Adding to the first line command
# -*- coding: UTF-8 -*-
http://www.vogella.com/tutorials/Python/article.html
Upvotes: 2
Views: 1933
Reputation: 394031
One way would be to create a dict and iterate over the k,v and use replace
:
In [42]:
repl_dict = {'ß':'ss', 'Ü':'UE'}
for k,v in repl_dict.items():
df.loc[df.X.str.contains(k), 'X'] = df.X.str.replace(pat=k,repl=v)
df
Out[42]:
X Y
0 Messi 01.01.1970
1 UErik 01.01.1990
EDIT
for editors that don't allow unicode encoding in the python script you can use the unicode values for the transliteration:
In [72]:
repl_dict = {'\u00DF':'ss', '\u00DC':'UE'}
for k,v in repl_dict.items():
df.loc[df.X.str.contains(k), 'X'] = df.X.str.replace(pat=k,repl=v)
df
Out[72]:
X Y
0 Messi 01.01.1970
1 UErik 01.01.1990
Upvotes: 1