Map partial string from dictionary in Pandas

Question

I like to map partial strings from dictionary keys to a Series like this:

df = pd.DataFrame(np.random.randint(0,10,size=(5, 1)), columns=list('A'))
df.insert(0, 'n', ['abcde Germany fffe','aaaa Norway bbbb',
                   'tttt Sweden','Croatia dfdfdf','Italy sfsd'])

>>> df

    n                   A
0   abcde Germany fffe  2
1   aaaa Norway bbbb    1
2   tttt Sweden         4
3   Croatia dfdfdf      1
4   Italy sfsd          2

d = {'Germany':0.5, 'Croatia':1.5, 'Italy':1.5}

Now I like to map d's keys to the n column to match partial strings and set the multiple. I achieved this by an ugly loop:

df['multiple'] = 1
for k, v in d.iteritems():
    df['multiple'] = np.where(df['n'].str.contains(k), v, df['multiple'])

>>> df

    n                   A   multiple
0   abcde Germany fffe  2   0.5
1   aaaa Norway bbbb    1   1.0
2   tttt Sweden         4   1.0
3   Croatia dfdfdf      1   1.5
4   Italy sfsd          2   1.5

I there a better, more Pandasly way? Thanks!

piRSquared · Accepted Answer

This is what I came up with

Solution

pat = r'({})'.format('|'.join(d.keys()))
extracted = df.n.str.extract(pat, expand=False).dropna()

df['multiple'] = extracted.apply(lambda x: d[x]).reindex(df.index).fillna(1)

Demonstration

print df

                    n  A  multiple
0  abcde Germany fffe  5       0.5
1    aaaa Norway bbbb  3       1.0
2         tttt Sweden  7       1.0
3      Croatia dfdfdf  5       1.5
4          Italy sfsd  9       1.5

Explanation

pat looks like r'(Croatia|Italy|Germany)' which is a regular expression that matches on anything of the options separated by '|' within (). When used in the str.extract method, it returns which country is matched. Then run an apply to get the dictionary value. Not all series values will be matched by a key in the dict so we must dropna then fillna later.

Map partial string from dictionary in Pandas

Answers (2)

Solution

Demonstration

Explanation

Related Questions