pandas dataframe add multiple rows for group of values with apply

Question

I have a pandas dataframe contains the following columns:

sate_1      state_2 
-----       -----
New York    Washington
Ohio        Utah

what I want to do is to construct an input date for each distinct combination of state_1 and state_2 so the results would be:

sate_1       state_2          date_time
New York     Washington     2017-11-01 00:00
New York     Washington     2017-11-01 03:00
New York     Washington     2017-11-01 06:00
Ohio         Utah           2017-11-01 00:00
Ohio         Utah           2017-11-01 03:00
Ohio         Utah           2017-11-01 06:00

so basically for each distinct row I want to return 3 time periods starting from today,s date. I have the logic to get the datetime but I'm doing this through a loop of all the records which is a time consuming since my data would contain more than 1M rows . I was wondering is there a way to do this using the apply function, which I'm assuming it will be much faster.

Peter Leimbigler · Accepted Answer

Setup

import pandas as pd
df = pd.DataFrame({'state_1': ['New York', 'Ohio'],
                   'state_2': ['Washington', 'Utah']})
# df:
#       state_1     state_2
# 0     New York    Washington
# 1     Ohio        Utah

Add the first of three desired dates to each row

df['date'] = pd.to_datetime('2017-11-01')
# df:
#       state_1     state_2     date
# 0     New York    Washington      2017-11-01
# 1     Ohio        Utah            2017-11-01

For each row, create a new DataFrame with all three desired dates, broadcast state names across these dates, and concatenate the list of DataFrames into one

pd.concat([
    pd.DataFrame(
        {'state_1': row.state_1,
         'state_2': row.state_2,
         'date': pd.date_range(row.date, freq='3h', periods=3)
        }
    ) for i, row in df.iterrows()
], ignore_index=True).loc[:, ['state_1', 'state_2', 'date']]
# df:
#   state_1     state_2     date
# 0 New York    Washington  2017-11-01 00:00:00
# 1 New York    Washington  2017-11-01 03:00:00
# 2 New York    Washington  2017-11-01 06:00:00
# 3 Ohio        Utah        2017-11-01 00:00:00
# 4 Ohio        Utah        2017-11-01 03:00:00
# 5 Ohio        Utah        2017-11-01 06:00:00

pandas dataframe add multiple rows for group of values with apply

Answers (1)

Setup

Add the first of three desired dates to each row

For each row, create a new DataFrame with all three desired dates, broadcast state names across these dates, and concatenate the list of DataFrames into one

Related Questions