Sarfraz
Sarfraz

Reputation: 27

Selecting and deleting a list of words from whole panda data frame in python

Sample Data So i have a large data set and i want to remove all the row containing multiple words like ('test', 'TEST', 'Test') I am not sure how to do it. I tried one way like this:

test_remove=df[df['Column1'].str.contains('test') 
|df['Column2'].str.contains('test') 
|df['Column3'].str.contains('test') 
|df['Column1'].str.contains('Test')
|df['Column2'].str.contains('Test') 
|df['Column3'].str.contains('Test')].index

Now to remove it from dataframe

df.drop(test_remove, inplace=True)

However, this works but with too many columns and multiple keyword i have to write a very long code to get this answer. is there any shorter way to do this by selecting all the rows contain list of words to be removed and than remove if from dataframe. Thanks

Upvotes: 1

Views: 36

Answers (2)

Riccardo Bucco
Riccardo Bucco

Reputation: 15364

You can dynamically generate a string with all the statements and then evaluate it with eval:

# List of columns to check
columns = ['col1', 'col2', 'col3']
# List of words to check
words = ['test', 'TEST', 'Test']

test_remove = df[eval('|'.join(f"df['{col}'].str.contains('{word}')"
                               for col in columns
                               for word in words))]

Upvotes: 0

Pramote Kuacharoen
Pramote Kuacharoen

Reputation: 1541

import pandas as pd
data = {'A': ['x', 'test', 'this', 'that'],
        'B': ['y', 'z', 'a', 'b'],
        'C': ['z', 'y', 'TEST', 'me']}
df = pd.DataFrame(data)
columns = df.columns
words = ['test', 'TEST', 'Test']
mask = 1
for col in columns:
    for word in words:
        mask = mask & ~df[col].str.contains(word)

df = df[mask]

Output

      A  B   C
0     x  y   z
3  that  b  me

Upvotes: 1

Related Questions