Reputation: 99
I have the following data frame
df = pd.DataFrame([['1','aa','ccc','rere','thth','my desc 1','','my feature2 1'], ['1','aa','fff','flfl','ipip','my desc 2','',''], ['1','aa','mmm','rprp','','','',''], ['2','aa','ccc','rprp','','','my feature1 1',''], ['2','aa','fff','bubu','thth','my desc 3','',''], ['2','aa','mmm','fafa','rtrt','my desc 4','',''], ['3','aa','ccc','blbl','thth','my desc 5','my feature1 2','my feature2 2'], ['3','aa','fff','arar','amam','my desc 6','',''], ['3','aa','mmm','acac','ryry','my desc 7','',''],['4','bb','coco','rere','','','','my feature2 3'], ['4','bb','inin','mimi','rere','my desc 8','',''], ['4','bb','itit','toto','enen','my desc 9','',''], ['4','bb','spsp','glgl','pepe','my desc 10','',''], ['5','bb','coco','baba','mpmp','my desc 11','my feature1 3',''], ['5','bb','inin','rere','','','',''],['5','bb','itit','toto','hrhr','my desc 12','',''], ['5','bb','spsp','glgl','lolo','my desc 13','','']], columns=['foo', 'bar','name_input','value_input','bulb','desc','feature1', 'feature2'])
Now, I need to delete row to get the below output.
df = pd.DataFrame([['1','aa','ccc','rere','thth','my desc 1','','my feature2 1'], ['2','aa','ccc','rprp','','my desc 3','my feature1 1',''], ['3','aa','ccc','blbl','thth','my desc 5','my feature1 2','my feature2 2'], ['4','bb','coco','rere','','my desc 8','','my feature2 3'], ['5','bb','coco','baba','mpmp','my desc 11','my feature1 3','']], columns=['foo', 'bar','name_input','value_input','bulb','desc','feature1', 'feature2'])
I tried the below. And none of them seem to work.
df= df.dropna(subset=['feature1', 'feature2'])
df.dropna(thresh=5, axis=0, inplace=True)
df= df[df.feature2.notnull()]
df= df[pd.notnull(df[['feature1', 'feature2']])]
Any help is much appreciated!
Upvotes: 2
Views: 58
Reputation: 294218
astype(bool)
Empty strings evaluate as False
in a boolean context. Use filter
to get at just the columns that start with feature
. Then use astype(bool)
and followed by any(axis=1)
df[df.filter(regex='fea').astype(bool).any(1)]
foo bar name_input value_input bulb desc feature1 feature2
0 1 aa ccc rere thth my desc 1 my feature2 1
3 2 aa ccc rprp my feature1 1
6 3 aa ccc blbl thth my desc 5 my feature1 2 my feature2 2
9 4 bb coco rere my feature2 3
13 5 bb coco baba mpmp my desc 11 my feature1 3
To match your results, we can back fill the desc
column
feat = df.filter(regex='feat').astype(bool).any(1)
desc = df.desc.where(df.desc.astype(bool)).bfill()
df.assign(desc=desc)[feat]
foo bar name_input value_input bulb desc feature1 feature2
0 1 aa ccc rere thth my desc 1 my feature2 1
3 2 aa ccc rprp my desc 3 my feature1 1
6 3 aa ccc blbl thth my desc 5 my feature1 2 my feature2 2
9 4 bb coco rere my desc 8 my feature2 3
13 5 bb coco baba mpmp my desc 11 my feature1 3
Upvotes: 3
Reputation: 23099
another method is to change your blank strings to true NaN
values then pass the how
argument to dropna
and use all
as the value
import numpy as np
df.replace('',np.nan).dropna(subset=['feature1','feature2'],how='all').fillna('')
foo bar name_input value_input bulb desc feature1 feature2
0 1 aa ccc rere thth my desc 1 my feature2 1
3 2 aa ccc rprp my feature1 1
6 3 aa ccc blbl thth my desc 5 my feature1 2 my feature2 2
9 4 bb coco rere my feature2 3
13 5 bb coco baba mpmp my desc 11 my feature1 3
Upvotes: 2