user14253628
user14253628

Reputation:

How often is an item in a purchase?

I would like to calculate how often an item appears in a shopping cart. I have a purchase recognizable by the buyerid. This buyerid can buy several items (also twice, triple,..., n-th times). Recognizable by itemid and description. I would like to count the number of times an item ends up in a shopping cart. For example, out of 5 purchases, 3 people bought an apple, i.e. 0.6%. I would like to spend this on all products, how do I do that?

import pandas as pd
d = {'buyerid': [0,0,1,2,3,3,3,4,4,4,4],
     'itemid': [0,1,2,1,1,1,2,4,5,1,1],
     'description': ['Banana', 'Apple', 'Apple', 'Strawberry', 'Apple', 'Banana', 'Apple', 'Dog-Food', 'Beef', 'Banana', 'Apple'], }
df = pd.DataFrame(data=d)
display(df.head(20))

enter image description here

My try:

# How many % of the articels are the same?
# this is wrong... :/
df_grouped = df.groupby('description').count()
display(df_grouped)

df_apple = df_grouped.iloc[0]
percentage = df_apple[0] / df.shape[0]
print(percentage)

[OUT] 0.45454545454545453

The mathematic formula

count of all buys (count_buy ) = 5
count how many an apple appears in the buy (count_apple) = 3
count_buy /count_apple =  3 / 5 = 0.6

What I would like to have (please note, I have not calculated the values, these are just dumy values)

enter image description here

Upvotes: 1

Views: 100

Answers (3)

jezrael
jezrael

Reputation: 862481

Use GroupBy.size and divide by count of unique values of buyerid by Series.nunique:

print (df.groupby(['itemid','description']).size())
itemid  description
0       Banana         1
1       Apple          3
        Banana         2
        Strawberry     1
2       Apple          2
4       Dog-Food       1
5       Beef           1
dtype: int64


purch = df['buyerid'].nunique()
df1 = df.groupby(['itemid','description']).size().div(purch).reset_index(name='percentage')
print (df1)
   itemid description  percentage
0       0      Banana         0.2
1       1       Apple         0.6
2       1      Banana         0.4
3       1  Strawberry         0.2
4       2       Apple         0.4
5       4    Dog-Food         0.2
6       5        Beef         0.2

Upvotes: 1

FloLie
FloLie

Reputation: 1840

As always, there are multiple ways to the gold, but i would go over pivoting as following: Your input:

import pandas as pd
d = {'buyerid': [0,0,1,2,3,3,3,4,4,4,4],
     'itemid': [0,1,2,1,1,1,2,4,5,1,1],
     'description': ['Banana', 'Apple', 'Apple', 'Strawberry', 'Apple', 'Banana', 'Apple', 'Dog-Food', 'Beef', 'Banana', 'Apple'], }
df = pd.DataFrame(data=d)

In a next step pivot the data with buyer_id as index and description as columns and replace NA with 0 as such

df2 = df.pivot_table(values='itemid', index='buyerid', columns='description', aggfunc='count')
df2 = df2.fillna(0)

resulting in

description  Apple  Banana  Beef  Dog-Food  Strawberry
buyerid                                               
0              1.0     1.0   0.0       0.0         0.0
1              1.0     0.0   0.0       0.0         0.0
2              0.0     0.0   0.0       0.0         1.0
3              2.0     1.0   0.0       0.0         0.0
4              1.0     1.0   1.0       1.0         0.0

calling the mean on the table:

df_final = df2.mean()

results in

description
Apple         1.0
Banana        0.6
Beef          0.2
Dog-Food      0.2
Strawberry    0.2
dtype: float64

Upvotes: 0

N. Arunoprayoch
N. Arunoprayoch

Reputation: 942

I would group it and create a new column as follows:

df_grp = df.groupby('description')['buyerid'].sum().reset_index(name='total')
df_grp['percentage'] = (df_grp.total / df_grp.total.sum()) * 100
df_grp

Result:

description total   percentage
0   Apple       11  39.285714
1   Banana      7   25.000000
2   Beef        4   14.285714
3   Dog-Food    4   14.285714
4   Strawberry  2   7.142857

Upvotes: 0

Related Questions