Reputation: 27
Sample Data So i have a large data set and i want to remove all the row containing multiple words like ('test', 'TEST', 'Test') I am not sure how to do it. I tried one way like this:
test_remove=df[df['Column1'].str.contains('test')
|df['Column2'].str.contains('test')
|df['Column3'].str.contains('test')
|df['Column1'].str.contains('Test')
|df['Column2'].str.contains('Test')
|df['Column3'].str.contains('Test')].index
Now to remove it from dataframe
df.drop(test_remove, inplace=True)
However, this works but with too many columns and multiple keyword i have to write a very long code to get this answer. is there any shorter way to do this by selecting all the rows contain list of words to be removed and than remove if from dataframe. Thanks
Upvotes: 1
Views: 36
Reputation: 15364
You can dynamically generate a string with all the statements and then evaluate it with eval
:
# List of columns to check
columns = ['col1', 'col2', 'col3']
# List of words to check
words = ['test', 'TEST', 'Test']
test_remove = df[eval('|'.join(f"df['{col}'].str.contains('{word}')"
for col in columns
for word in words))]
Upvotes: 0
Reputation: 1541
import pandas as pd
data = {'A': ['x', 'test', 'this', 'that'],
'B': ['y', 'z', 'a', 'b'],
'C': ['z', 'y', 'TEST', 'me']}
df = pd.DataFrame(data)
columns = df.columns
words = ['test', 'TEST', 'Test']
mask = 1
for col in columns:
for word in words:
mask = mask & ~df[col].str.contains(word)
df = df[mask]
Output
A B C
0 x y z
3 that b me
Upvotes: 1