Reputation: 2259
I have a list of keywords as well as a DF that contains a text column. I am trying to filter out every row where the text in the text field contains one of the keywords. I believe what am I looking for is something like the .isin
method but that would be able to take a regex argument as I am searching for substrings within the text not exact matches.
What I have:
keys = ['key','key2']
A Text
0 5 Sample text one
1 6 Sample text two
2 3 Sample text three key
3 4 Sample text four key2
And I would like to remove any rows that contain a key in the text so I would end up with:
A Text
0 5 Sample text one
1 6 Sample text two
Upvotes: 1
Views: 1964
Reputation: 393963
use str.contains
and join the keys using |
to create a regex pattern and negate the boolean mask ~
to filter your df:
In [123]:
keys = ['key','key2']
df[~df['Text'].str.contains('|'.join(keys))]
Out[123]:
A Text
0 5 Sample text one
1 6 Sample text two
Upvotes: 5