smiss
smiss

Reputation: 55

Map partial string from dictionary in Pandas

I like to map partial strings from dictionary keys to a Series like this:

df = pd.DataFrame(np.random.randint(0,10,size=(5, 1)), columns=list('A'))
df.insert(0, 'n', ['abcde Germany fffe','aaaa Norway bbbb',
                   'tttt Sweden','Croatia dfdfdf','Italy sfsd'])

>>> df

    n                   A
0   abcde Germany fffe  2
1   aaaa Norway bbbb    1
2   tttt Sweden         4
3   Croatia dfdfdf      1
4   Italy sfsd          2

d = {'Germany':0.5, 'Croatia':1.5, 'Italy':1.5}

Now I like to map d's keys to the n column to match partial strings and set the multiple. I achieved this by an ugly loop:

df['multiple'] = 1
for k, v in d.iteritems():
    df['multiple'] = np.where(df['n'].str.contains(k), v, df['multiple'])

>>> df

    n                   A   multiple
0   abcde Germany fffe  2   0.5
1   aaaa Norway bbbb    1   1.0
2   tttt Sweden         4   1.0
3   Croatia dfdfdf      1   1.5
4   Italy sfsd          2   1.5

I there a better, more Pandasly way? Thanks!

Upvotes: 2

Views: 3531

Answers (2)

su79eu7k
su79eu7k

Reputation: 7316

df['multiple'] = df['n'].str.extract('('+'|'.join(list(d))+')').map(d).fillna(1)
print df

                    n  A  multiple
0  abcde Germany fffe  7       0.5
1    aaaa Norway bbbb  0       1.0
2         tttt Sweden  3       1.0
3      Croatia dfdfdf  8       1.5
4          Italy sfsd  4       1.5

Upvotes: 3

piRSquared
piRSquared

Reputation: 294288

This is what I came up with

Solution

pat = r'({})'.format('|'.join(d.keys()))
extracted = df.n.str.extract(pat, expand=False).dropna()

df['multiple'] = extracted.apply(lambda x: d[x]).reindex(df.index).fillna(1)

Demonstration

print df

                    n  A  multiple
0  abcde Germany fffe  5       0.5
1    aaaa Norway bbbb  3       1.0
2         tttt Sweden  7       1.0
3      Croatia dfdfdf  5       1.5
4          Italy sfsd  9       1.5

Explanation

pat looks like r'(Croatia|Italy|Germany)' which is a regular expression that matches on anything of the options separated by '|' within (). When used in the str.extract method, it returns which country is matched. Then run an apply to get the dictionary value. Not all series values will be matched by a key in the dict so we must dropna then fillna later.

Upvotes: 5

Related Questions