pandas: groupby two columns nunique

Question

I have the following sample set.

        CustID     Condition      Month        Reading  Consumption 
0     108000601         True       June       20110606      28320.0
1     108007000         True       July       20110705      13760.0
2     108007000         True     August       20110804      16240.0
3     108008000         True  September       20110901      12560.0
4     108008000         True    October       20111004      12400.0
5     108000601        False   November       20111101       9440.0
6     108090000        False   December       20111205      12160.0
7     108008000        False    January       20120106      11360.0
8     108000601         True   February       20120206      10480.0
9     108000601         True      March       20120306       9840.0

The following groupby provides me part of what I'm looking for.

dfm.groupby(['Condition'])['CustID'].nunique()

Condition
True      3
False     3

But how do I get unique ID's that match both conditions? e.g.

Condition
True      3
False     3
Both      2

maxymoo · Accepted Answer

Not sure if this is the most "pandas" way but you can use set to compare the users in each partition (the Python set data-structure is a hash table which will automatically discard duplicates):

custid_true = set(dfm[dfm['Condition']==True].CustID)
custid_false = set(dfm[dfm['Condition']==False].CustID)
custid_both = custid_true.intersection(custid_false)
n_custid_both = len(custid_both)

pandas: groupby two columns nunique

Answers (2)

Related Questions