Reputation: 59
I have created a pandas dataframe for a store
I have columns Transaction and Item_Type
import pandas as pd
data = {'Transaction':[1, 2, 2, 2, 3], 'Item_Type':['Food', 'Drink', 'Food', 'Drink', 'Food']}
df = pd.DataFrame(data, columns=['Transaction', 'Item_Type'])
Transaction Item_Type
1 Food
2 Drink
2 Food
2 Drink
3 Food
I am trying to group by transaction and count the number of drinks per transaction, but cannot find the right syntax to do it.
df = df.groupby(['Transaction','Item_Type']).size()
This sort of works, but gives me a multi-index Series, which I cannot yet figure out how to select drinks per transaction from it.
1/Food 1
2/Drink 2
2/Food 1
3/Food 1
This seems clunky - is there a better way?
This stackoverflow seemed most similar Adding a 'count' column to the result of a groupby in pandas?
Upvotes: 2
Views: 867
Reputation: 59549
With a particular condition, you can sum
the Boolean Series, within group, after you check the condition.
df['Item_Type'].eq('Drink').groupby(df['Transaction']).sum()
#Transaction
#1 0.0
#2 2.0
#3 0.0
#Name: Item_Type, dtype: float64
Upvotes: 2
Reputation: 25239
You may use agg
and value_counts
s = df.astype(str).agg('/'.join, axis=1).value_counts(sort=False)
Out[61]:
3/Food 1
2/Drink 2
1/Food 1
2/Food 1
dtype: int64
If you want to keep the original order, chain additional sort_index
s = df.astype(str).agg('/'.join, axis=1).value_counts().sort_index(kind='mergesort')
Out[62]:
1/Food 1
2/Drink 2
2/Food 1
3/Food 1
dtype: int64
Upvotes: 1
Reputation: 78690
Edit: replacing original xs
approach with unstack
after seeing anky's answer.
>>> df.groupby('Transaction')['Item_Type'].value_counts().unstack(fill_value=0)['Drink']
Transaction
1 0
2 2
3 0
Name: Drink, dtype: int64
Upvotes: 3
Reputation: 75080
Another way possible with pivot_table
:
s = df.pivot_table(index='Transaction',
columns='Item_Type',aggfunc=len).stack().astype(int)
Or:
s = df.pivot_table(index=['Transaction','Item_Type'],aggfunc=len) #@thanks @Ch3steR
s.index = s.index.map("{0[0]}/{0[1]}".format)
print(s)
1/Food 1
2/Drink 2
2/Food 1
3/Food 1
Or if you wish to filter a particular category:
to_filter = 'Drink'
(df.pivot_table(index='Transaction',columns='Item_Type',aggfunc=len,fill_value=0)
.filter(items=[to_filter]))
Item_Type Drink
Transaction
1 0
2 2
3 0
Upvotes: 5
Reputation: 59
I found a solution I think
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
df = df.groupby(['Transaction','Item_Type']).size().reset_index(name='counts')
Gives me the information I need
Transaction Item_Type counts
1 Food 1
2 Drink 2
2 Food 1
3 Food 1
Upvotes: 1