GNMO11
GNMO11

Reputation: 2259

Pandas filter rows by substring within text column

I have a list of keywords as well as a DF that contains a text column. I am trying to filter out every row where the text in the text field contains one of the keywords. I believe what am I looking for is something like the .isin method but that would be able to take a regex argument as I am searching for substrings within the text not exact matches.

What I have:

keys = ['key','key2']

   A        Text
0  5   Sample text one
1  6   Sample text two 
2  3   Sample text three key
3  4   Sample text four key2

And I would like to remove any rows that contain a key in the text so I would end up with:

   A        Text
0  5   Sample text one
1  6   Sample text two 

Upvotes: 1

Views: 1964

Answers (1)

EdChum
EdChum

Reputation: 393963

use str.contains and join the keys using | to create a regex pattern and negate the boolean mask ~ to filter your df:

In [123]:
keys = ['key','key2']    ​
df[~df['Text'].str.contains('|'.join(keys))]

Out[123]:
   A              Text
0  5   Sample text one
1  6   Sample text two

Upvotes: 5

Related Questions