Proton
Proton

Reputation: 63

Tokenize data in Python(converting data into patterns)

I have a dataframe which is like the one below:

Name      | City

Apple     | Tokyo
Papaya    | Pune
TimGru334 | Shanghai
236577    | Delhi

I need to iterate through each value and need to tokenise data in Python. To explain in detail:

Can someone help me out please?

P.S: I'm new to the platform, so please excuse me if I'm wrong in any manner. Thanks in advance :)

Upvotes: 4

Views: 63

Answers (2)

U13-Forward
U13-Forward

Reputation: 71580

Use str.replace:

df['Name'] = df['Name'].str.replace('\D', 'c').str.replace('\d', 'd')

And now:

print(df)

Is:

        Name      City
0      ccccc     Tokyo
1     cccccc      Pune
2  ccccccddd  Shanghai
3     dddddd     Delhi

To do all columns, use @jezrael's answer, otherwise use:

df = df.apply(lambda x: x.str.replace('\D', 'c').str.replace('\d', 'd'))

Upvotes: 3

jezrael
jezrael

Reputation: 862641

Use Series.replace - first non numeric and then numeric values - order of values in lists is important:

df['Name'] = df['Name'].replace(['\D', '\d'], ['c','d'], regex=True)
print (df)
        Name      City
0      ccccc     Tokyo
1     cccccc      Pune
2  ccccccddd  Shanghai
3     dddddd     Delhi

If need replace all columns:

df = df.replace(['\D', '\d'], ['c','d'], regex=True)
print (df)
        Name      City
0      ccccc     ccccc
1     cccccc      cccc
2  ccccccddd  cccccccc
3     dddddd     ccccc

Upvotes: 4

Related Questions