how to group values belonging to same group in same column in pandas

Question

I have a table that contains the purchases of customers on a supermarket, the maximum number of products purchased is 11 and the minimum is 1.

Each row represents a unique customer:

data.head(9)

I want to group the purchased products according to their group (type) in order to analyze each group independently. for example, I want to group the Fruits products in column 1, the vegetable products in column 2 ...etc.

for example; the second customer doesn't buy vegetables so in the output, the vegetable cell is empty. But he bought Lip balm (it is a cosmetic), so in the output, it will be placed in the cosmetics column.

This table is the output of the first table, I use colors to differentiate between groups of products (types of product). I want output like that:

output.head(9)

I have another table that contains the groups and their all possible values:

groups.head()

NOTE: each customer has purchased only one product from the same group.

How can do that in python dataframe?

Vitalizzare · Accepted Answer

import pandas as pd
from io import StringIO

data = '''
item1,item2,item3
lemon,potato,
potato,yogurt,coconut
coconut,broccoli,
curd,,
'''

data = pd.read_csv(StringIO(data))

groups = '''
fruits,vegetables,dairy
lemon,potato,yogurt
coconut,broccoli,curd
,corn,
'''

groups = pd.read_csv(StringIO(groups))
groups = (
    groups
    .melt(var_name='group',value_name='product')
    .dropna()
    .set_index('product')
    .squeeze()
)

output = data.apply(
    lambda row: pd.Series({groups[p]: p for p in row.dropna()}), 
    axis=1
)

how to group values belonging to same group in same column in pandas

Answers (1)

Related Questions