Natasha
Natasha

Reputation: 1521

Map partial string from dictionary in Pandas(again)

This is a follow up to a previous post Map partial string from dictionary in Pandas

I modified the mapping dictionary a bit

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,10,size=(5, 1)), columns=list('A'))
df.insert(0, 'n', ['abcde Germany fffe','aaaa Norway bbbb',
                   'tttt Sweden','Croatia dfdfdf','Italy sfsd'])

d = {'Germany':0.5, 'Croatia':1.5, 'Italy':1.5, 'Ital':1, 'German':0.9}

df['multiple'] = 1
for k, v in d.items():
    df['multiple'] = np.where(df['n'].str.contains(k), v, df['multiple'])

print(df)

Obtained output:

                    n  A  multiple
0  abcde Germany fffe  3       0.9
1    aaaa Norway bbbb  7       1.0
2         tttt Sweden  5       1.0
3      Croatia dfdfdf  8       1.5
4          Italy sfsd  3       1.0

Expected:

                    n  A  multiple
0  abcde Germany fffe  3       0.5
1    aaaa Norway bbbb  7       1.0
2         tttt Sweden  5       1.0
3      Croatia dfdfdf  8       1.5
4          Italy sfsd  3       1.5

Suggestions on how to obtain the expected output will be really helpful.

Upvotes: 1

Views: 360

Answers (1)

anky
anky

Reputation: 75120

Here is one approach(similar to the linked post) which extracts the word in keys of dictionary and then maps the values using series.map then fillna with 1 where there is no match:

pat = r'\b(?:{})\b'.format('|'.join(d.keys()))
df['multiple'] = df['n'].str.extract('('+pat+')',expand=False).map(d).fillna(1)

print(df)
                    n  A  multiple
0  abcde Germany fffe  5       0.5
1    aaaa Norway bbbb  4       1.0
2         tttt Sweden  1       1.0
3      Croatia dfdfdf  8       1.5
4          Italy sfsd  0       1.5

Upvotes: 1

Related Questions