bikerider
bikerider

Reputation: 101

Python dataframe matching strings in a list

I need to search a dataframe column for matching strings within a list and return the match into a new column in the dataframe. The below code works but it is horribly inefficient and I have millions of rows in my dataframe.

import pandas as pd 
Cars = {'MakeModel': ['HondaCivic','Toyota_Corolla','FordFocus','Audi--A4']}  
df = pd.DataFrame(data=Cars) 

mlist = ['Honda','Toyota','Ford','Audi'] 

for i in df.index:  
    for x in mlist:     
        if x in df.get_value(i,'MakeModel'): 
            df.set_value(i,'Make', x) 

Upvotes: 1

Views: 74

Answers (1)

cs95
cs95

Reputation: 402844

Let's use str.extract with a capture group here. This extracts the "make" from each cell if it exists, or inserts NaNs in that row.

import re

df['Make'] = df['MakeModel'].str.extract(
    r'({})'.format('|'.join(map(re.escape, mlist))), expand=False)
df
        MakeModel    Make
0      HondaCivic   Honda
1  Toyota_Corolla  Toyota
2       FordFocus    Ford
3        Audi--A4    Audi

map(re.escape, mlist) can be replaced with mlist if you're sure your mlist strings do not contain any regex meta-characters which require escaping.

Upvotes: 1

Related Questions