Reputation: 314
I have data for puchases like below
User Product
u1 p1
u1 p2
u2 p1
u3 p5
... ...
... ...
... ...
where users is large (order of 10k) and p's are small.
I want to compute a matrix type structure indicating how many people purchased p1, p1 and p2, p1 and p3 etc like below
p1 p2 p3 p4 p5 Total
p1 1 1 2
p2 1 1
p3
p4
p5 1 1
I really could not find a way to achieve this..What I have attempted is this..
pd.pivot_table(df,index='Product',columns='Product').fillna('')
Num
Product p1 p2 p5
Product
p1 1.0
p2 1.0
p5 1.0
what is the standard way to do this?
Upvotes: 0
Views: 52
Reputation: 10545
There is probably a more clever way to do this, but here is a straightforward approach:
import numpy as np
import pandas as pd
df = pd.DataFrame([['u1', 'p1'],
['u1', 'p2'],
['u2', 'p1'],
['u3', 'p5']],
columns=['User', 'Product'])
products = sorted(list(set(df.Product)))
n_products = len(products)
product_pairs = np.zeros(shape=(n_products, n_products),
dtype=int)
for cart in df.groupby('User')['Product'].apply(set):
for row, p1 in enumerate(products):
if p1 in cart:
for col, p2 in enumerate(products):
if p2 in cart:
product_pairs[row, col] += 1
pd.DataFrame(product_pairs, index=products,
columns=products)
p1 p2 p5
p1 2 1 0
p2 1 1 0
p5 0 0 1
Upvotes: 1