gibo
gibo

Reputation: 537

Remove rows if group contains an empty column

I have multiple rows in my dataframe and a column called 'name'.

I want to groupby 'name' and remove every item in the group, if any row has an empty column (either, nan, none or an empty string).

How can I achieve this?

Input

       name    c0  c1  c2
193556   INFO    1       
273142    OMN    1  1   1
256278    OMN    1  1   1
41165    INFO    1  1   1
339649   INFO    1  1   1

Output

       name    c0  c1  c2
273142    OMN    1  1   1
256278    OMN    1  1   1

Upvotes: 3

Views: 1320

Answers (2)

jezrael
jezrael

Reputation: 863166

For better performance don't use groupby, a better method is to get all names by the condition and filter by isin:

If the empty value is NaN, you need only change == '' to .isnull().

names = df.loc[(df.iloc[:, 1:] == '').any(axis=1), 'name']
#alternative
#names = df.loc[(df.drop('names', axis=1) == '').any(axis=1), 'name']

df = df[~df['name'].isin(names)]
print (df)
       name  c0 c1 c2
273142  OMN   1  1  1
256278  OMN   1  1  1

Detail:

Check all values without first column by condition:

print ((df.iloc[:, 1:] == ''))
           c0     c1     c2
193556  False   True   True
273142  False  False  False
256278  False  False  False
41165   False  False  False
339649  False  False  False

Check if there is at least one True per column with any:

print ((df.iloc[:, 1:] == '').any(axis=1))
193556     True
273142    False
256278    False
41165     False
339649    False
dtype: bool

Get names by filtering column name:

print (df.loc[(df.iloc[:, 1:] == '').any(axis=1), 'name'])
193556    INFO
Name: name, dtype: object

Upvotes: 1

iacob
iacob

Reputation: 24261

You can use filter, checking if an empty value ("") exists in a group and filtering on that condition:

import pandas as pd

df.groupby('name').filter(lambda x: (x != "").all().all())

Upvotes: 2

Related Questions