Reputation: 537
I have multiple rows in my dataframe and a column called 'name'.
I want to groupby 'name' and remove every item in the group, if any row has an empty column (either, nan, none or an empty string).
How can I achieve this?
Input
name c0 c1 c2
193556 INFO 1
273142 OMN 1 1 1
256278 OMN 1 1 1
41165 INFO 1 1 1
339649 INFO 1 1 1
Output
name c0 c1 c2
273142 OMN 1 1 1
256278 OMN 1 1 1
Upvotes: 3
Views: 1320
Reputation: 863166
For better performance don't use groupby
, a better method is to get all names by the condition and filter by isin
:
If the empty value is NaN
, you need only change == ''
to .isnull()
.
names = df.loc[(df.iloc[:, 1:] == '').any(axis=1), 'name']
#alternative
#names = df.loc[(df.drop('names', axis=1) == '').any(axis=1), 'name']
df = df[~df['name'].isin(names)]
print (df)
name c0 c1 c2
273142 OMN 1 1 1
256278 OMN 1 1 1
Detail:
Check all values without first column by condition:
print ((df.iloc[:, 1:] == ''))
c0 c1 c2
193556 False True True
273142 False False False
256278 False False False
41165 False False False
339649 False False False
Check if there is at least one True
per column with any
:
print ((df.iloc[:, 1:] == '').any(axis=1))
193556 True
273142 False
256278 False
41165 False
339649 False
dtype: bool
Get names by filtering column name
:
print (df.loc[(df.iloc[:, 1:] == '').any(axis=1), 'name'])
193556 INFO
Name: name, dtype: object
Upvotes: 1
Reputation: 24261
You can use filter
, checking if an empty value (""
) exists in a group and filtering on that condition:
import pandas as pd
df.groupby('name').filter(lambda x: (x != "").all().all())
Upvotes: 2