Yolo_chicken
Yolo_chicken

Reputation: 1391

Remove rows that contain False in a column of pandas dataframe

I assume this is an easy fix and I'm not sure what I'm missing. I have a data frame as such:

         index               c1       c2         c3
2015-03-07 01:27:05        False    False       True   
2015-03-07 01:27:10        False    False       True   
2015-03-07 01:27:15        False    False       False   
2015-03-07 01:27:20        False    False       True   
2015-03-07 01:27:25        False    False       False   
2015-03-07 01:27:30        False    False       True   

I want to remove any rows that contain False in c3. c3 is a dtype=bool. I'm consistently running into problems since it's a boolean and not a string/int/etc, I haven't handled that before.

Upvotes: 31

Views: 103382

Answers (5)

nocibambi
nocibambi

Reputation: 2431

Another option is to use pipe:

df.pipe(lambda x: x[x['c3']])

It also works in a method chain like query, but also with a Series:

df['c3'].pipe(lambda x: x[x])

Upvotes: 1

ASGM
ASGM

Reputation: 11391

Pandas deals with booleans in a really neat, straightforward manner:

df = df[df.c3]

This does the same thing but without creating a copy (making it faster):

df = df.loc[df.c3, :]

When you're filtering dataframes using df[...], you often write some function that returns a boolean value (like df.x > 2). But in this case, since the column is already a boolean, you can just put df.c3 in on its own, which will get you all the rows that are True.

If you wanted to get the opposite (as the original title to your question implied), you could use df[~df.c3] or df.loc[~df.c3, :], where the ~ inverts the booleans.

For more on boolean indexing in Pandas, see the docs. Thanks to @Mr_and_Mrs_D for the suggestion about .loc.

Upvotes: 59

Asclepius
Asclepius

Reputation: 63516

Consider DataFrame.query. This allows a chained operation, thereby avoiding referring to the dataframe by the name of its variable.

filtered_df = df.query('my_col')

This should return rows where my_col evaluates to true. To invert the results, use query('~my_col') instead.

To do this in-place instead:

df.query('my_col', inplace=True)

Upvotes: 5

piRSquared
piRSquared

Reputation: 294516

Solution

df.drop(df[df['c3'] == False].index, inplace=True)

This explicitly drops rows where 'c3' is False and not just keeping rows that evaluate to True

Upvotes: 9

DeepSpace
DeepSpace

Reputation: 81684

Well the question's title and the question itself are the exact opposite, but:

df = df[df['c3'] == True]  # df will have only rows with True in c3

Upvotes: 16

Related Questions