Reputation: 159
I have an example .csv, imported as df.csv, as follows:
Ethnicity, Description
0 French, Irish Dance Company
1 Italian, Moroccan/Algerian
2 Danish, Company in Netherlands
3 Dutch, French
4 English, EnglishFrench
5 Irish, Irish-American
I'd like to check the pandas test1['Description']
for strings in test1['Ethnicity']
. This should return rows 0, 3, 4, and 5 as the description strings contain strings in the ethnicity column.
So far I've tried:
df[df['Ethnicity'].str.contains('French')]['Description']
This returns any specific string, but I'd like to iterate through without searching for each specific ethnicity value. I've also tried converting the columns to lists and iterating through but can't seem to find a way to return the row, as it is no long a DataFrame().
Thank you in advance!
Upvotes: 4
Views: 2822
Reputation: 862406
You can use str.contains
with values in column Ethnicity
converted tolist
and then join
by |
what is in regex
or
:
print ('|'.join(df.Ethnicity.tolist()))
French|Italian|Danish|Dutch|English|Irish
mask = df.Description.str.contains('|'.join(df.Ethnicity.tolist()))
print (mask)
0 True
1 False
2 False
3 True
4 True
5 True
Name: Description, dtype: bool
#boolean-indexing
print (df[mask])
Ethnicity Description
0 French Irish Dance Company
3 Dutch French
4 English EnglishFrench
5 Irish Irish-American
It looks like you can omit tolist()
:
print (df[df.Description.str.contains('|'.join(df.Ethnicity))])
Ethnicity Description
0 French Irish Dance Company
3 Dutch French
4 English EnglishFrench
5 Irish Irish-American
Upvotes: 5
Reputation: 294218
the ever popular double apply:
df[df.Description.apply(lambda x: df.Ethnicity.apply(lambda y: y in x)).any(1)]
Ethnicity Description
0 French Irish Dance Company
3 Dutch French
4 English EnglishFrench
5 Irish Irish-American
jezrael's answer is far superior
Upvotes: 1