shallow_water
shallow_water

Reputation: 131

Pandas iteratively append row values from multiple DataFrame columns

I want to iteratively append row values from multiple columns to a new column in a new DataFrame based on a group.

My goal is to have 1 row for each customer, with 1 column for the customer's ID and 1 column for their timeline that lists the date of each event followed by the event description, for all dates and events, in chronological order.

I have solved this with a series of dictionaries. I am searching for a clean, elegant, pandas-style way to accomplish this as this code will be run frequently with small changes to customers, events, etc.

Example:

import pandas as pd

df_have = pd.DataFrame({'Customer_ID':['customer_1','customer_1','customer_1','customer_2','customer_2'], 
                        'Event':['purchased cornflakes','purchased eggs', 'purchased waffles','sold eggs','purchased cows'],
                           'Date':['2011-06-16','2011-06-13','2011-06-09','2011-06-13','2011-06-18']})

df_have['Date'] = pd.to_datetime(df_have['Date'])

df_have.sort_values(['Customer_ID','Date'], inplace =True)
df_have

df I currently have

df_want = pd.DataFrame({'Customer_ID':['customer_1','customer_2'],
                       'Time_Line':[['2011-06-09,purchased waffles,2011-06-13,purchased eggs,2011-06-16,purchased cornflakes'],
                                   ['2011-06-13,sold eggs,2011-06-18,purchased cows']]})
df_want

df I'd like to have

Upvotes: 2

Views: 431

Answers (1)

Nickil Maveli
Nickil Maveli

Reputation: 29711

Steps:

1) Set Customer_ID to be the index axis as it would remain static throughout the operation.

2) stack so that Date and Event fall below one another.

3) Peform groupby w.r.t the index (level=0) and convert the only column into list. Since we've stacked them in this sequence, they would appear alternatingly.


# set maximum width of columns to be displayed
pd.set_option('max_colwidth', 100)

df_have.set_index('Customer_ID').stack(
    ).groupby(level=0).apply(list).reset_index(name="Time_Line")

enter image description here


To change the order in which sequence occurs inside the list:

df_have.set_index('Customer_ID').reindex_axis(['Event', 'Date'], axis=1).stack(
    ).groupby(level=0).apply(list).reset_index(name="Time_Line")

enter image description here

Upvotes: 2

Related Questions