Christopher Goings
Christopher Goings

Reputation: 159

Search Pandas Column for Substring in other Column

I have an example .csv, imported as df.csv, as follows:

    Ethnicity, Description
  0 French, Irish Dance Company
  1 Italian, Moroccan/Algerian
  2 Danish, Company in Netherlands
  3 Dutch, French
  4 English, EnglishFrench
  5 Irish, Irish-American

I'd like to check the pandas test1['Description'] for strings in test1['Ethnicity']. This should return rows 0, 3, 4, and 5 as the description strings contain strings in the ethnicity column.

So far I've tried:

df[df['Ethnicity'].str.contains('French')]['Description']

This returns any specific string, but I'd like to iterate through without searching for each specific ethnicity value. I've also tried converting the columns to lists and iterating through but can't seem to find a way to return the row, as it is no long a DataFrame().

Thank you in advance!

Upvotes: 4

Views: 2822

Answers (2)

jezrael
jezrael

Reputation: 862406

You can use str.contains with values in column Ethnicity converted tolist and then join by | what is in regex or:

print ('|'.join(df.Ethnicity.tolist()))
French|Italian|Danish|Dutch|English|Irish

mask = df.Description.str.contains('|'.join(df.Ethnicity.tolist()))
print (mask)
0     True
1    False
2    False
3     True
4     True
5     True
Name: Description, dtype: bool

#boolean-indexing
print (df[mask])
  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

It looks like you can omit tolist():

print (df[df.Description.str.contains('|'.join(df.Ethnicity))])
  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

Upvotes: 5

piRSquared
piRSquared

Reputation: 294218

the ever popular double apply:

df[df.Description.apply(lambda x: df.Ethnicity.apply(lambda y: y in x)).any(1)]

  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

Timing

jezrael's answer is far superior

enter image description here

Upvotes: 1

Related Questions