Reputation: 4574
I have a dataframe that looks like the following
category subcategory contract week1 week2 week3
cat1 sub1 11001 20 20 10
cat1 sub1 11001 0 0 30
cat1 sub2 11002 10 20 0
cat1 sub2 11003 10 20 0
cat2 sub3 11004 10 0 50
cat2 sub3 11005 10 20 0
I would like to count for each week the number of unique contracts that are non-zero in the week by category and subcategory.
category | subcategory | week1 | week2 | week3 |
-----------------------------------------------
cat1 | sub1 | 1 | 1 | 1 |
cat1 | sub2 | 2 | 2 | 0 |
cat2 | sub3 | 2 | 1 | 1 |
I am trying to put together a toy example for this, but new with pandas so I also struggling there.
Upvotes: 1
Views: 217
Reputation: 880757
First, group by 'category'
, 'subcategory'
, 'contract'
, take the sum and test if the sum is greater than zero:
In [179]: result = df.groupby(['category', 'subcategory', 'contract']).sum() > 0
In [180]: result
Out[180]:
week1 week2 week3
category subcategory contract
cat1 sub1 11001 True True True
sub2 11002 True True False
11003 True True False
cat2 sub3 11004 True False True
11005 True True False
Now group this result by 'category'
, and 'subcategory'
, and sum these groups to count the number of items in each group which are True:
In [181]: result.groupby(level=['category','subcategory']).sum().dropna(axis=0)
Out[181]:
week1 week2 week3
category subcategory
cat1 sub1 1 1 1
sub2 2 2 0
cat2 sub3 2 1 1
import io
import pandas as pd
df = '''\
category | subcategory | contract | week1 | week2 | week3
cat1 | sub1 | 11001 | 20 | 20 | 10
cat1 | sub1 | 11001 | 0 | 0 | 30
cat1 | sub2 | 11002 | 10 | 20 | 0
cat1 | sub2 | 11003 | 10 | 20 | 0
cat2 | sub3 | 11004 | 10 | 0 | 50
cat2 | sub3 | 11005 | 10 | 20 | 0 '''
df = pd.read_table(io.BytesIO(df), sep=r'\s*[|]\s*')
result = df.groupby(['category', 'subcategory', 'contract']).sum() > 0
result = result.groupby(level=['category','subcategory']).sum().dropna(axis=0)
print(result)
yields
week1 week2 week3
category subcategory
cat1 sub1 1 1 1
sub2 2 2 0
cat2 sub3 2 1 1
Upvotes: 3