Removing rows in a pandas DataFrame where the row contains a string present in a list?

Question

I know how to remove rows from a single-column ('From') pandas DataFrame where the row contains a string e.g given df and somestring:

df = df[~df.From.str.contains(someString)]

Now I wish to do something similar, but this time I wish to remove any rows that contain a string that is in any element of another list. Were I not using pandas, I would use for and the if ... not ... in approach. But how do I take advantage of pandas' own functionality to achieve this? Given the list of items to remove ignorethese, extracted from a file of comma-separated strings EMAILS_TO_IGNORE, I tried:

with open(EMAILS_TO_IGNORE) as emails:
        ignorethese = emails.read().split(', ')
        df = df[~df.From.isin(ignorethese)]

Am I convoluting matters by first decomposing the file into a list? Given that it is a plain text file of comma-separated values, can I bypass this with something simpler?

Anand S Kumar · Accepted Answer

Series.str.contains supports regular expression , you can create a regex from your list of emails to ignore by using | to OR them , and then use that in contains . Example -

df[~df.From.str.contains('|'.join(ignorethese))]

Demo -

In [109]: df
Out[109]:
                                         From
0         Grey Caulfu 
1  Deren Torculas 
2    Charlto Youna 

In [110]: ignorelist = ['grey.caulfu@ymail.com','deren.e.torcs87@gmail.com']

In [111]: ignorere = '|'.join(ignorelist)

In [112]: df[~df.From.str.contains(ignorere)]
Out[112]:
                                       From
2  Charlto Youna

Please note, as mentioned in the documentation it uses re.search() .

Removing rows in a pandas DataFrame where the row contains a string present in a list?

Answers (1)

Related Questions