Reputation: 43
I have data for multiple customers in data frame as below-
Customer_id event_type month mins_spent 1 live CM 10 1 live CM1 10 1 catchup CM2 20 1 live CM2 30 2 live CM 45 2 live CM1 30 2 catchup CM2 20 2 live CM2 20
I need the result data frame so that there is one row for each customer and column are combined value of column month and event_type and value will be mins_spent. Result data frame as below-
Customer_id CM_live CM_catchup CM1_live CM1_catchup CM2_live CM2_catchup 1 10 0 10 0 30 20 2 45 0 30 0 20 20
Is there an efficient way to do this instead of iterating the input data frame and creating the new data frame ?
Upvotes: 0
Views: 204
Reputation: 2399
You can create a new column (key
) by concatenating columns month
and event_type
, and then use pivot()
to reshape your data.
(df.assign(key = lambda d: d['month'] + '_' + d['event_type'])
.pivot(
index='Customer_id',
columns='key',
values='mins_spent'
))
Upvotes: 1
Reputation: 14113
you can use pivot_table
# pivot your data frame
p = df.pivot_table(values='mins_spent', index='Customer_id',
columns=['month', 'event_type'], aggfunc=np.sum)
# flatten multi indexed columns with list comprehension
p.columns = ['_'.join(col) for col in p.columns]
CM_live CM1_live CM2_catchup CM2_live
Customer_id
1 10 10 20 30
2 45 30 20 20
Upvotes: 1