user
user

Reputation: 771

How to count number of unique groups missing information in a groupby?

I have a dataframe that looks like this:

data = [
    ["AB", np.nan, 134, "ID1"],
    ["AB", np.nan, 252, "ID1"],
    ["BC", np.nan, 56, "ID2"],
    ["CD", np.nan, 159, "ID1"],
    ["DE", 32, np.nan, "ID3"],
]

df = pd.DataFrame(data, columns=["method", "var_1", "var_2", "ID"])
df

I am trying to get a count of unique IDs grouped by method which have missing values for var_1 and var_2 and am unable to find a way to do this.

I have been able to get counts of missing data using count() and size() and subtracting one from the other, but unfortunately I really need counts of unique IDs. It seems so simple I feel as though I must be missing something obvious!

My desired output is:

Where we are grouping by method and counting the number of unique IDs missing information for the other columns.

method  var_1  var_2
  AB      1      0
  BC      1      0
  CD      1      0
  DE      0      1

Upvotes: 3

Views: 479

Answers (2)

ALollz
ALollz

Reputation: 59579

To count unique IDs, check where it's null then max within [ID, method], to indicate any missing value within that [ID, method]. Then sum over the method to get the Number of unique IDS missing something.

(df[['var_1', 'var_2']].isnull()
    .groupby([df['ID'], df['method']]).max()
    .sum(level='method')

        var_1  var_2
method              
AB          1      0
CD          1      0
BC          1      0
DE          0      1

Upvotes: 2

BENY
BENY

Reputation: 323356

In your case

g = df.groupby(['ID','method'])
out = g[['var_1','var_2']].apply(lambda x : x.isnull().sum())
            var_1  var_2
ID  method              
ID1 AB          1      0
    CD          1      0
ID2 BC          1      0
ID3 DE          0      1

Upvotes: 4

Related Questions