Pandas iteratively append row values from multiple DataFrame columns

Question

I want to iteratively append row values from multiple columns to a new column in a new DataFrame based on a group.

My goal is to have 1 row for each customer, with 1 column for the customer's ID and 1 column for their timeline that lists the date of each event followed by the event description, for all dates and events, in chronological order.

I have solved this with a series of dictionaries. I am searching for a clean, elegant, pandas-style way to accomplish this as this code will be run frequently with small changes to customers, events, etc.

Example:

import pandas as pd

df_have = pd.DataFrame({'Customer_ID':['customer_1','customer_1','customer_1','customer_2','customer_2'], 
                        'Event':['purchased cornflakes','purchased eggs', 'purchased waffles','sold eggs','purchased cows'],
                           'Date':['2011-06-16','2011-06-13','2011-06-09','2011-06-13','2011-06-18']})

df_have['Date'] = pd.to_datetime(df_have['Date'])

df_have.sort_values(['Customer_ID','Date'], inplace =True)
df_have

df I currently have

df_want = pd.DataFrame({'Customer_ID':['customer_1','customer_2'],
                       'Time_Line':[['2011-06-09,purchased waffles,2011-06-13,purchased eggs,2011-06-16,purchased cornflakes'],
                                   ['2011-06-13,sold eggs,2011-06-18,purchased cows']]})
df_want

df I'd like to have

Nickil Maveli · Accepted Answer

Steps:

1) Set Customer_ID to be the index axis as it would remain static throughout the operation.

2) stack so that Date and Event fall below one another.

3) Peform groupby w.r.t the index (level=0) and convert the only column into list. Since we've stacked them in this sequence, they would appear alternatingly.

# set maximum width of columns to be displayed
pd.set_option('max_colwidth', 100)

df_have.set_index('Customer_ID').stack(
    ).groupby(level=0).apply(list).reset_index(name="Time_Line")

To change the order in which sequence occurs inside the list:

df_have.set_index('Customer_ID').reindex_axis(['Event', 'Date'], axis=1).stack(
    ).groupby(level=0).apply(list).reset_index(name="Time_Line")

Pandas iteratively append row values from multiple DataFrame columns

Answers (1)

Related Questions