Reputation: 63
I have a dataframe which is like the one below:
Name | City
Apple | Tokyo
Papaya | Pune
TimGru334 | Shanghai
236577 | Delhi
I need to iterate through each value and need to tokenise data in Python. To explain in detail:
Can someone help me out please?
P.S: I'm new to the platform, so please excuse me if I'm wrong in any manner. Thanks in advance :)
Upvotes: 4
Views: 63
Reputation: 71580
Use str.replace
:
df['Name'] = df['Name'].str.replace('\D', 'c').str.replace('\d', 'd')
And now:
print(df)
Is:
Name City
0 ccccc Tokyo
1 cccccc Pune
2 ccccccddd Shanghai
3 dddddd Delhi
To do all columns, use @jezrael's answer, otherwise use:
df = df.apply(lambda x: x.str.replace('\D', 'c').str.replace('\d', 'd'))
Upvotes: 3
Reputation: 862641
Use Series.replace
- first non numeric and then numeric values - order of values in lists is important:
df['Name'] = df['Name'].replace(['\D', '\d'], ['c','d'], regex=True)
print (df)
Name City
0 ccccc Tokyo
1 cccccc Pune
2 ccccccddd Shanghai
3 dddddd Delhi
If need replace all columns:
df = df.replace(['\D', '\d'], ['c','d'], regex=True)
print (df)
Name City
0 ccccc ccccc
1 cccccc cccc
2 ccccccddd cccccccc
3 dddddd ccccc
Upvotes: 4