Reputation: 101
I need to search a dataframe column for matching strings within a list and return the match into a new column in the dataframe. The below code works but it is horribly inefficient and I have millions of rows in my dataframe.
import pandas as pd
Cars = {'MakeModel': ['HondaCivic','Toyota_Corolla','FordFocus','Audi--A4']}
df = pd.DataFrame(data=Cars)
mlist = ['Honda','Toyota','Ford','Audi']
for i in df.index:
for x in mlist:
if x in df.get_value(i,'MakeModel'):
df.set_value(i,'Make', x)
Upvotes: 1
Views: 74
Reputation: 402844
Let's use str.extract
with a capture group here. This extracts the "make" from each cell if it exists, or inserts NaNs in that row.
import re
df['Make'] = df['MakeModel'].str.extract(
r'({})'.format('|'.join(map(re.escape, mlist))), expand=False)
df
MakeModel Make
0 HondaCivic Honda
1 Toyota_Corolla Toyota
2 FordFocus Ford
3 Audi--A4 Audi
map(re.escape, mlist)
can be replaced with mlist
if you're sure your mlist
strings do not contain any regex meta-characters which require escaping.
Upvotes: 1