Reputation: 17631
I have the following pandas dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({"shops": ["shop1", "shop2", "shop3", "shop4", "shop5", "shop6"], "franchise" : ["franchise_A", "franchise_A", "franchise_A", "franchise_A", "franchise_B", "franchise_B"],"items" : ["dog", "cat", "dog", "dog", "bird", "fish"]})
df = df[["shops", "franchise", "items"]]
print(df)
shops franchise items
0 shop1 franchise_A dog
1 shop2 franchise_A cat
2 shop3 franchise_A dog
3 shop4 franchise_A dog
4 shop5 franchise_B bird
5 shop6 franchise_B fish
So, each row is a unique sample shop1
, shop2
, etc. whereby each sample belongs to a subgroup franchise_A
, franchise_B
, franchise_C
, etc.
In the items
column, there are only four categorical values possible: dog
, cat
, fish
, bird
. My motivation is to create a barplot of the number of dog
, cat
, fish
, bird
for each "franchise".
I would like the output to be
franchise dogs cats birds fish
franchise_A 3 1 0 0
franchise_B 0 0 1 1
I believe I first have to use groupby()
, e.g.
df.groupby("franchise").count()
shops items
franchise
franchise_A 4 4
franchise_B 2 2
But I'm not sure how I count the number of items for each franchise.
Upvotes: 6
Views: 9145
Reputation: 29690
You could include the items
column in the groupby
, then use size
.
>>> df.groupby(['franchise', 'items']).size().unstack(fill_value=0)
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
(Rough) Benchmark
%timeit df.groupby(['franchise', 'items']).size().unstack(fill_value=0)
100 loops, best of 3: 2.73 ms per loop
%timeit (df.groupby("franchise")['items'].apply(Counter).unstack(fill_value=0).astype(int))
100 loops, best of 3: 4.18 ms per loop
%timeit df.groupby('franchise')['items'].value_counts().unstack(fill_value=0)
100 loops, best of 3: 2.71 ms per loop
Upvotes: 4
Reputation: 862661
You can use value_counts
with unstack
, thanks Nickil Maveli:
from collections import Counter
print (df.groupby("franchise")['items'].value_counts().unstack(fill_value=0))
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
Another solutions with crosstab
and pivot_table
:
print (pd.crosstab(df["franchise"], df['items']))
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
print (df.pivot_table(index="franchise", columns='items', aggfunc='size', fill_value=0))
items bird cat dog fish
franchise
franchise_A 0 1 3 0
franchise_B 1 0 0 1
Upvotes: 13