Reputation: 35
I have a table that contains the purchases of customers on a supermarket, the maximum number of products purchased is 11 and the minimum is 1.
Each row represents a unique customer:
data.head(9)
I want to group the purchased products according to their group (type) in order to analyze each group independently. for example, I want to group the Fruits products in column 1, the vegetable products in column 2 ...etc.
for example; the second customer doesn't buy vegetables so in the output, the vegetable cell is empty. But he bought Lip balm (it is a cosmetic), so in the output, it will be placed in the cosmetics column.
This table is the output of the first table, I use colors to differentiate between groups of products (types of product). I want output like that:
output.head(9)
I have another table that contains the groups and their all possible values:
groups.head()
NOTE: each customer has purchased only one product from the same group.
How can do that in python dataframe?
Upvotes: 1
Views: 93
Reputation: 7230
import pandas as pd
from io import StringIO
data = '''
item1,item2,item3
lemon,potato,
potato,yogurt,coconut
coconut,broccoli,
curd,,
'''
data = pd.read_csv(StringIO(data))
groups = '''
fruits,vegetables,dairy
lemon,potato,yogurt
coconut,broccoli,curd
,corn,
'''
groups = pd.read_csv(StringIO(groups))
groups = (
groups
.melt(var_name='group',value_name='product')
.dropna()
.set_index('product')
.squeeze()
)
output = data.apply(
lambda row: pd.Series({groups[p]: p for p in row.dropna()}),
axis=1
)
Upvotes: 1