Reputation: 131
I want to iteratively append row values from multiple columns to a new column in a new DataFrame based on a group.
My goal is to have 1 row for each customer, with 1 column for the customer's ID and 1 column for their timeline that lists the date of each event followed by the event description, for all dates and events, in chronological order.
I have solved this with a series of dictionaries. I am searching for a clean, elegant, pandas-style way to accomplish this as this code will be run frequently with small changes to customers, events, etc.
Example:
import pandas as pd
df_have = pd.DataFrame({'Customer_ID':['customer_1','customer_1','customer_1','customer_2','customer_2'],
'Event':['purchased cornflakes','purchased eggs', 'purchased waffles','sold eggs','purchased cows'],
'Date':['2011-06-16','2011-06-13','2011-06-09','2011-06-13','2011-06-18']})
df_have['Date'] = pd.to_datetime(df_have['Date'])
df_have.sort_values(['Customer_ID','Date'], inplace =True)
df_have
df_want = pd.DataFrame({'Customer_ID':['customer_1','customer_2'],
'Time_Line':[['2011-06-09,purchased waffles,2011-06-13,purchased eggs,2011-06-16,purchased cornflakes'],
['2011-06-13,sold eggs,2011-06-18,purchased cows']]})
df_want
Upvotes: 2
Views: 431
Reputation: 29711
Steps:
1) Set Customer_ID
to be the index axis as it would remain static throughout the operation.
2) stack
so that Date
and Event
fall below one another.
3) Peform groupby
w.r.t the index (level=0
) and convert the only column into list
. Since we've stacked them in this sequence, they would appear alternatingly.
# set maximum width of columns to be displayed
pd.set_option('max_colwidth', 100)
df_have.set_index('Customer_ID').stack(
).groupby(level=0).apply(list).reset_index(name="Time_Line")
To change the order in which sequence occurs inside the list
:
df_have.set_index('Customer_ID').reindex_axis(['Event', 'Date'], axis=1).stack(
).groupby(level=0).apply(list).reset_index(name="Time_Line")
Upvotes: 2