RoccoMaxamas
RoccoMaxamas

Reputation: 349

Checking DataFrame column in Pandas using both .isin and str.contains

I want to return the values in a column based on whether its value contains (i.e., has substring) any string within a list of strings.

For example,

values = ['dog', 'cat', 'ant']

df = pd.DataFrame({'col1': ['dog', 'cat', 'fox', 'monkey', 'antelope'], 'col2': [3, 4, 1, 6, 9]})

I know that if I want to compare vs one substring, I can:

df[df['col1'].str.contains('dog')

And if I knew the full values (as opposed to just a substring), I could do:

df.loc[df['col1'].isin(values)]

However, I'm not sure how to combine the two functions.

I was thinking I could loop over.

def func(data):
    for x in values:
        if x in data:
           return True
    return False

df['include'] = df.apply(func)

But this doesn't work (my column just is 'NaN' values)--and it honestly seems like there is probably a better way.

Upvotes: 1

Views: 773

Answers (1)

Cyprian
Cyprian

Reputation: 11374

A bit late but this should work ;)

df = df[df['col1'].str.contains('|'.join(values), case=False, na=False)]

Upvotes: 0

Related Questions