Shri
Shri

Reputation: 99

How to delete row in pandas based on conditions?

I have the following data frame

df = pd.DataFrame([['1','aa','ccc','rere','thth','my desc 1','','my feature2 1'], ['1','aa','fff','flfl','ipip','my desc 2','',''], ['1','aa','mmm','rprp','','','',''], ['2','aa','ccc','rprp','','','my feature1 1',''], ['2','aa','fff','bubu','thth','my desc 3','',''], ['2','aa','mmm','fafa','rtrt','my desc 4','',''], ['3','aa','ccc','blbl','thth','my desc 5','my feature1 2','my feature2 2'], ['3','aa','fff','arar','amam','my desc 6','',''], ['3','aa','mmm','acac','ryry','my desc 7','',''],['4','bb','coco','rere','','','','my feature2 3'], ['4','bb','inin','mimi','rere','my desc 8','',''], ['4','bb','itit','toto','enen','my desc 9','',''], ['4','bb','spsp','glgl','pepe','my desc 10','',''], ['5','bb','coco','baba','mpmp','my desc 11','my feature1 3',''], ['5','bb','inin','rere','','','',''],['5','bb','itit','toto','hrhr','my desc 12','',''], ['5','bb','spsp','glgl','lolo','my desc 13','','']], columns=['foo', 'bar','name_input','value_input','bulb','desc','feature1', 'feature2'])

Now, I need to delete row to get the below output.

df = pd.DataFrame([['1','aa','ccc','rere','thth','my desc 1','','my feature2 1'], ['2','aa','ccc','rprp','','my desc 3','my feature1 1',''], ['3','aa','ccc','blbl','thth','my desc 5','my feature1 2','my feature2 2'], ['4','bb','coco','rere','','my desc 8','','my feature2 3'], ['5','bb','coco','baba','mpmp','my desc 11','my feature1 3','']], columns=['foo', 'bar','name_input','value_input','bulb','desc','feature1', 'feature2'])

I tried the below. And none of them seem to work.

df= df.dropna(subset=['feature1', 'feature2'])
df.dropna(thresh=5, axis=0, inplace=True)
df= df[df.feature2.notnull()]
df= df[pd.notnull(df[['feature1', 'feature2']])]

Any help is much appreciated!

Upvotes: 2

Views: 58

Answers (2)

piRSquared
piRSquared

Reputation: 294218

astype(bool)

Empty strings evaluate as False in a boolean context. Use filter to get at just the columns that start with feature. Then use astype(bool) and followed by any(axis=1)

df[df.filter(regex='fea').astype(bool).any(1)]

   foo bar name_input value_input  bulb        desc       feature1       feature2
0    1  aa        ccc        rere  thth   my desc 1                 my feature2 1
3    2  aa        ccc        rprp                    my feature1 1               
6    3  aa        ccc        blbl  thth   my desc 5  my feature1 2  my feature2 2
9    4  bb       coco        rere                                   my feature2 3
13   5  bb       coco        baba  mpmp  my desc 11  my feature1 3     

To match your results, we can back fill the desc column

feat = df.filter(regex='feat').astype(bool).any(1)
desc = df.desc.where(df.desc.astype(bool)).bfill()
df.assign(desc=desc)[feat]

   foo bar name_input value_input  bulb        desc       feature1       feature2
0    1  aa        ccc        rere  thth   my desc 1                 my feature2 1
3    2  aa        ccc        rprp         my desc 3  my feature1 1               
6    3  aa        ccc        blbl  thth   my desc 5  my feature1 2  my feature2 2
9    4  bb       coco        rere         my desc 8                 my feature2 3
13   5  bb       coco        baba  mpmp  my desc 11  my feature1 3               

Upvotes: 3

Umar.H
Umar.H

Reputation: 23099

another method is to change your blank strings to true NaN values then pass the how argument to dropna and use all as the value

import numpy as np
df.replace('',np.nan).dropna(subset=['feature1','feature2'],how='all').fillna('')


   foo bar name_input value_input  bulb        desc       feature1  feature2
0    1  aa        ccc        rere  thth   my desc 1                 my feature2 1
3    2  aa        ccc        rprp                    my feature1 1   
6    3  aa        ccc        blbl  thth   my desc 5  my feature1 2  my feature2 2
9    4  bb       coco        rere                                   my feature2 3 
13   5  bb       coco        baba  mpmp  my desc 11  my feature1 3  

Upvotes: 2

Related Questions