Reputation:
I have two dataframes. The first is longitudinal and has three columns 'Customer', 'Product' and 'Category'. It has one entry per item that a customer buys.
The second should contain one row per Customer and one column per Category. In each of these columns it should give all Products this Customer bought from the respective category as a single string separated by blanks.
df_1 = {'Customer': [1, 1, 1, 1, 2, 2, 2, 2],
'Product': ['Apple', 'Chocolate', 'Orange', 'Cookie', 'Banana', 'Apple', 'Cookie', 'Apple'],
'Category': ['Fruit', 'Sweets', 'Fruit', 'Sweets', 'Fruit', 'Fruit', 'Sweets', 'Fruit']}
The target DataFrame should look like this:
df_2 = {'Customer': [1, 2],
'Fruits': ['Apple Orange', 'Banana Apple Apple'],
'Sweets': ['Chocolate Cookie', 'Cookie']}
How to best approach this.
Upvotes: 0
Views: 48
Reputation: 15364
Here's an option:
df_1.groupby(['Customer', 'Category'])['Product'].apply(' '.join).unstack()
Upvotes: 0
Reputation: 14949
Use pivot_table
with aggfunc=' '.join
:
df_2 = (
(df_1.pivot_table(
index='Customer',
columns='Category',
values='Product',
aggfunc=' '.join)
)
.rename_axis(columns=None)
.reset_index()
)
Customer Fruit Sweets
0 1 Apple Orange Chocolate Cookie
1 2 Banana Apple Apple Cookie
Upvotes: 0