Pandas filter rows by substring within text column

Question

I have a list of keywords as well as a DF that contains a text column. I am trying to filter out every row where the text in the text field contains one of the keywords. I believe what am I looking for is something like the .isin method but that would be able to take a regex argument as I am searching for substrings within the text not exact matches.

What I have:

keys = ['key','key2']

   A        Text
0  5   Sample text one
1  6   Sample text two 
2  3   Sample text three key
3  4   Sample text four key2

And I would like to remove any rows that contain a key in the text so I would end up with:

   A        Text
0  5   Sample text one
1  6   Sample text two

EdChum · Accepted Answer

use str.contains and join the keys using | to create a regex pattern and negate the boolean mask ~ to filter your df:

In [123]:
keys = ['key','key2']    
df[~df['Text'].str.contains('|'.join(keys))]

Out[123]:
   A              Text
0  5   Sample text one
1  6   Sample text two

Pandas filter rows by substring within text column

Answers (1)

Related Questions