NoIdeaHowToFixThis
NoIdeaHowToFixThis

Reputation: 4574

pandas dataframe count uniques with respect to another column

I have a dataframe that looks like the following

category  subcategory  contract     week1   week2  week3
cat1      sub1          11001       20     20      10 
cat1      sub1          11001       0      0       30  
cat1      sub2          11002       10     20      0  
cat1      sub2          11003       10     20      0  
cat2      sub3          11004       10     0       50 
cat2      sub3          11005       10     20      0 

I would like to count for each week the number of unique contracts that are non-zero in the week by category and subcategory.

category | subcategory | week1 | week2 | week3 |
-----------------------------------------------
cat1     | sub1        | 1     | 1     |   1   |
cat1     | sub2        | 2     | 2     |   0   |
cat2     | sub3        | 2     | 1     |   1   | 

I am trying to put together a toy example for this, but new with pandas so I also struggling there.

Upvotes: 1

Views: 217

Answers (1)

unutbu
unutbu

Reputation: 880757

First, group by 'category', 'subcategory', 'contract', take the sum and test if the sum is greater than zero:

In [179]: result = df.groupby(['category', 'subcategory', 'contract']).sum() > 0

In [180]: result
Out[180]: 
                              week1  week2  week3
category subcategory contract                    
cat1     sub1        11001     True   True   True
         sub2        11002     True   True  False
                     11003     True   True  False
cat2     sub3        11004     True  False   True
                     11005     True   True  False

Now group this result by 'category', and 'subcategory', and sum these groups to count the number of items in each group which are True:

In [181]: result.groupby(level=['category','subcategory']).sum().dropna(axis=0)
Out[181]: 
                      week1  week2  week3
category subcategory                     
cat1     sub1             1      1      1
         sub2             2      2      0
cat2     sub3             2      1      1

import io
import pandas as pd

df = '''\
category | subcategory | contract | week1 | week2 | week3
cat1     | sub1         | 11001 |      20 |    20 |    10
cat1     | sub1         | 11001 |      0  |    0  |    30   
cat1     | sub2         | 11002 |      10 |    20 |    0    
cat1     | sub2         | 11003 |      10 |    20 |    0    
cat2     | sub3         | 11004 |      10 |    0  |    50   
cat2     | sub3         | 11005 |      10 |    20 |    0 '''

df = pd.read_table(io.BytesIO(df), sep=r'\s*[|]\s*')
result = df.groupby(['category', 'subcategory', 'contract']).sum() > 0
result = result.groupby(level=['category','subcategory']).sum().dropna(axis=0)
print(result)

yields

                      week1  week2  week3
category subcategory                     
cat1     sub1             1      1      1
         sub2             2      2      0
cat2     sub3             2      1      1

Upvotes: 3

Related Questions