Reputation: 1065
I have a data frame that look like this:
Customer_ID Category Products
1 Veg A
2 Veg B
3 Fruit A
3 Fruit B
3 Veg B
1 Fruit A
3 Veg C
1 Fruit C
I want to find out the for each customer ID for each category which products were bought, and create a column for each product accordingly. The output would look like this:
Customer_ID Category Pro_1 Pro_2 Pro_3
1 Veg A NA NA
1 Fruit A NA C
2 Veg NA B NA
3 Veg NA B C
3 Fruit A B NA
Upvotes: 2
Views: 297
Reputation: 862671
Use groupby
with unstack
, but if duplicates rows data are concanecate together:
df = df.groupby(['Customer_ID','Category','Products'])['Products'].sum().unstack()
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
Customer_ID Category Pro_1 Pro_2 Pro_3
0 1 Fruit A None C
1 1 Veg A None None
2 2 Veg None B None
3 3 Fruit A B None
4 3 Veg None B C
Another solution with helper column, triples has to be unique:
#if not unique triples remove duplicates
df = df.drop_duplicates(['Customer_ID','Category','Products'])
df['a'] = df['Products']
df = df.set_index(['Customer_ID','Category','Products'])['a'].unstack()
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
Customer_ID Category Pro_1 Pro_2 Pro_3
0 1 Fruit A None C
1 1 Veg A None None
2 2 Veg None B None
3 3 Fruit A B None
4 3 Veg None B C
Upvotes: 1
Reputation: 5961
Another option using crosstab
:
pd.crosstab([df['Customer_ID'],df['Category']], df['Products'])
output:
Products A B C
Customer_ID Category
1 Fruit 1 0 1
Veg 1 0 0
2 Veg 0 1 0
3 Fruit 1 1 0
Veg 0 1 1
Afterwards you can reset the index for a similar solution to what you wanted.
df = df.reset_index()
Products Customer_ID Category A B C
0 1 Fruit 1 0 1
1 1 Veg 1 0 0
2 2 Veg 0 1 0
3 3 Fruit 1 1 0
4 3 Veg 0 1 1
Upvotes: 1
Reputation: 1028
Try this: (don't mind the IO thing it is just for simple copy/paste)
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
Customer_ID Category Products
1 Veg A
2 Veg B
3 Fruit A
3 Fruit B
3 Veg B
1 Fruit A
3 Veg C
1 Fruit C"""), sep='\s+')
df = df.join(pd.get_dummies(df['Products']))
g = df.groupby(['Customer_ID', 'Category']).sum()
print(g)
output:
A B C
Customer_ID Category
1 Fruit 1 0 1
Veg 1 0 0
2 Veg 0 1 0
3 Fruit 1 1 0
Veg 0 1 1
Upvotes: 0