Reputation: 55
I like to map partial strings from dictionary keys to a Series like this:
df = pd.DataFrame(np.random.randint(0,10,size=(5, 1)), columns=list('A'))
df.insert(0, 'n', ['abcde Germany fffe','aaaa Norway bbbb',
'tttt Sweden','Croatia dfdfdf','Italy sfsd'])
>>> df
n A
0 abcde Germany fffe 2
1 aaaa Norway bbbb 1
2 tttt Sweden 4
3 Croatia dfdfdf 1
4 Italy sfsd 2
d = {'Germany':0.5, 'Croatia':1.5, 'Italy':1.5}
Now I like to map d's keys to the n column to match partial strings and set the multiple. I achieved this by an ugly loop:
df['multiple'] = 1
for k, v in d.iteritems():
df['multiple'] = np.where(df['n'].str.contains(k), v, df['multiple'])
>>> df
n A multiple
0 abcde Germany fffe 2 0.5
1 aaaa Norway bbbb 1 1.0
2 tttt Sweden 4 1.0
3 Croatia dfdfdf 1 1.5
4 Italy sfsd 2 1.5
I there a better, more Pandasly way? Thanks!
Upvotes: 2
Views: 3531
Reputation: 7316
df['multiple'] = df['n'].str.extract('('+'|'.join(list(d))+')').map(d).fillna(1)
print df
n A multiple
0 abcde Germany fffe 7 0.5
1 aaaa Norway bbbb 0 1.0
2 tttt Sweden 3 1.0
3 Croatia dfdfdf 8 1.5
4 Italy sfsd 4 1.5
Upvotes: 3
Reputation: 294288
This is what I came up with
pat = r'({})'.format('|'.join(d.keys()))
extracted = df.n.str.extract(pat, expand=False).dropna()
df['multiple'] = extracted.apply(lambda x: d[x]).reindex(df.index).fillna(1)
print df
n A multiple
0 abcde Germany fffe 5 0.5
1 aaaa Norway bbbb 3 1.0
2 tttt Sweden 7 1.0
3 Croatia dfdfdf 5 1.5
4 Italy sfsd 9 1.5
pat
looks like r'(Croatia|Italy|Germany)'
which is a regular expression that matches on anything of the options separated by '|'
within ()
. When used in the str.extract
method, it returns which country is matched. Then run an apply
to get the dictionary value. Not all series values will be matched by a key in the dict so we must dropna
then fillna
later.
Upvotes: 5