elksie5000
elksie5000

Reputation: 7782

Using boolean masks in Pandas

This is probably a trivial query but I can't work it out.

Essentially, I want to be able to filter out noisy tweets from a dataframe below

<class 'pandas.core.frame.DataFrame'>
Int64Index: 140381 entries, 0 to 140380
Data columns:
text          140381  non-null values
created_at    140381  non-null values
id            140381  non-null values
from_user     140381  non-null values
geo           5493  non-null values
dtypes: float64(1), object(4)

I can create a dataframe based on unwanted keywords thus:

junk = df[df.text.str.contains("Swans")]

But what's the best way to use this to see what's left?

Upvotes: 3

Views: 5428

Answers (2)

Mohamed Ali JAMAOUI
Mohamed Ali JAMAOUI

Reputation: 14699

You can also use the following two options:

option 1:

df[-df.text.str.contains("Swans")]

option 2:

import numpy as np 
df[np.invert(df.text.str.contains("Swans"))]

Upvotes: 1

waitingkuo
waitingkuo

Reputation: 93964

df[~df.text.str.contains("Swans")]

Upvotes: 6

Related Questions