stites
stites

Reputation: 5143

Search for "does-not-contain" on a DataFrame in pandas

I've done some searching and can't figure out how to filter a dataframe by

df["col"].str.contains(word)

however I'm wondering if there is a way to do the reverse: filter a dataframe by that set's compliment. eg: to the effect of

!(df["col"].str.contains(word))

Can this be done through a DataFrame method?

Upvotes: 292

Views: 449696

Answers (11)

Yanick N'depo
Yanick N'depo

Reputation: 21

Another simple way could be to use query method, combined with f-string and NOT.

It should be like: df.query(f"not {col}.str.contains('{word}')")

Upvotes: 2

Kyle Bennison
Kyle Bennison

Reputation: 11

To add clarity to the top answer, the general pattern for filtering all columns that contain a specific word is:

# Remove any column with "word" in the name
new_df = df.loc[:, ~df.columns.str.contains("word")]

# Filter multiple words
new_df = df.loc[:, ~df.columns.str.contains("word1|word2")]

Upvotes: 1

Andy Hayden
Andy Hayden

Reputation: 375565

You can use the invert (~) operator (which acts like a not for boolean data):

new_df = df[~df["col"].str.contains(word)]

where new_df is the copy returned by RHS.

contains also accepts a regular expression...


If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False:

new_df = df[~df["col"].str.contains(word, na=False)]

Or,

new_df = df[df["col"].str.contains(word) == False]

Upvotes: 556

Bhanu Chander
Bhanu Chander

Reputation: 490

somehow '.contains' didn't work for me but when I tried with '.isin' as mentioned by @kenan in the answer (How to drop rows from pandas data frame that contains a particular string in a particular column?) it works. Adding further, if you want to look at the entire dataframe and remove those rows which has the specific word (or set of words) just use the loop below

for col in df.columns:
    df = df[~df[col].isin(['string or string list separeted by comma'])]

just remove ~ to get the dataframe that contains the word

Upvotes: 2

vasanth
vasanth

Reputation: 49

To compliment to the above question, if someone wants to remove all the rows with strings, one could do:

df_new=df[~df['col_name'].apply(lambda x: isinstance(x, str))]

Upvotes: 1

Noordeen
Noordeen

Reputation: 1614

I hope the answers are already posted

I am adding the framework to find multiple words and negate those from dataFrame.

Here 'word1','word2','word3','word4' = list of patterns to search

df = DataFrame

column_a = A column name from DataFrame df

values_to_remove = ['word1','word2','word3','word4'] 

pattern = '|'.join(values_to_remove)

result = df.loc[~df['column_a'].str.contains(pattern, case=False)]

Upvotes: 21

rachwa
rachwa

Reputation: 2310

To negate your query use ~. Using query has the advantage of returning the valid observations of df directly:

df.query('~col.str.contains("word").values')

Upvotes: 10

Arash
Arash

Reputation: 1054

You can use Apply and Lambda :

df[df["col"].apply(lambda x: word not in x)]

Or if you want to define more complex rule, you can use AND:

df[df["col"].apply(lambda x: word_1 not in x and word_2 not in x)]

Upvotes: 22

nanselm2
nanselm2

Reputation: 1497

I was having trouble with the not (~) symbol as well, so here's another way from another StackOverflow thread:

df[df["col"].str.contains('this|that')==False]

Upvotes: 94

U13-Forward
U13-Forward

Reputation: 71580

Additional to nanselm2's answer, you can use 0 instead of False:

df["col"].str.contains(word)==0

Upvotes: 3

Shoresh
Shoresh

Reputation: 2853

I had to get rid of the NULL values before using the command recommended by Andy above. An example:

df = pd.DataFrame(index = [0, 1, 2], columns=['first', 'second', 'third'])
df.ix[:, 'first'] = 'myword'
df.ix[0, 'second'] = 'myword'
df.ix[2, 'second'] = 'myword'
df.ix[1, 'third'] = 'myword'
df

    first   second  third
0   myword  myword   NaN
1   myword  NaN      myword 
2   myword  myword   NaN

Now running the command:

~df["second"].str.contains(word)

I get the following error:

TypeError: bad operand type for unary ~: 'float'

I got rid of the NULL values using dropna() or fillna() first and retried the command with no problem.

Upvotes: 8

Related Questions