How to count number of unique groups missing information in a groupby?

Question

I have a dataframe that looks like this:

data = [
    ["AB", np.nan, 134, "ID1"],
    ["AB", np.nan, 252, "ID1"],
    ["BC", np.nan, 56, "ID2"],
    ["CD", np.nan, 159, "ID1"],
    ["DE", 32, np.nan, "ID3"],
]

df = pd.DataFrame(data, columns=["method", "var_1", "var_2", "ID"])
df

I am trying to get a count of unique IDs grouped by method which have missing values for var_1 and var_2 and am unable to find a way to do this.

I have been able to get counts of missing data using count() and size() and subtracting one from the other, but unfortunately I really need counts of unique IDs. It seems so simple I feel as though I must be missing something obvious!

My desired output is:

Where we are grouping by method and counting the number of unique IDs missing information for the other columns.

method  var_1  var_2
  AB      1      0
  BC      1      0
  CD      1      0
  DE      0      1

ALollz · Accepted Answer

To count unique IDs, check where it's null then max within [ID, method], to indicate any missing value within that [ID, method]. Then sum over the method to get the Number of unique IDS missing something.

(df[['var_1', 'var_2']].isnull()
    .groupby([df['ID'], df['method']]).max()
    .sum(level='method')

        var_1  var_2
method              
AB          1      0
CD          1      0
BC          1      0
DE          0      1

How to count number of unique groups missing information in a groupby?

Answers (2)

Related Questions