Jarratt Perkins
Jarratt Perkins

Reputation: 303

Pandas remove row if all values in that row are false

I am looking to tidy up some rows in my dataset

output = {
         'id1': ['False', 'False', 'False', 'False'],
         'id2': ['True', 'False', 'False', 'False'],
         'id3': ['True', 'False', 'True', 'False'],
         'id4': ['True', 'False', 'True', 'True'],
    }

So using the above row 2 contains all false enter image description here

Due to this i wont need to use it so I would just like to remove it

newoutput = {
         'id1': ['False',  'False', 'False'],
         'id2': ['True',  'False', 'False'],
         'id3': ['True',  'True', 'False'],
         'id4': ['True',  'True', 'True'],
    }

I got as far as checking for rows with false in it

output.drop(output[output != False].index, inplace=True)

But that just looks at ANY value in row being False and not All

Upvotes: 3

Views: 5332

Answers (2)

jezrael
jezrael

Reputation: 862406

  • This assumes a dataframe of Boolean values, not string type.
  • As noted in the other answer, use .replace to replace string with Boolean type.

Given

df = pd.DataFrame({'id1': [True, False, False, False], 'id2': [True, False, False, False], 'id3': [True, False, True, False], 'id4': [True, False, True, True]})

     id1    id2    id3    id4
0   True   True   True   True
1  False  False  False  False
2  False  False   True   True
3  False  False  False   True

Use DataFrame.any to match at least one True with boolean indexing, which will remove rows that are all False.

df = df[df.any(axis=1)]

     id1    id2    id3   id4
0   True   True   True  True
2  False  False   True  True
3  False  False  False  True

Alternatively, to delete rows that are all True, use .all() and negate with ~.

df[~df.all(axis=1)]

     id1    id2    id3    id4
1  False  False  False  False
2  False  False   True   True
3  False  False  False   True

Upvotes: 10

timgeb
timgeb

Reputation: 78650

I constructed df via df = pd.DataFrame(output).

You can use

>>> df[df.replace({'False': False, 'True': True}).any(1)]
     id1    id2    id3   id4
0  False   True   True  True
2  False  False   True  True
3  False  False  False  True

I strongly suggest using booleans instead of strings to indicate True and False. In that case the solution is a simple reassignment to df[df.any(1)].

Upvotes: 2

Related Questions