Expand rows by date range having start and end in Pandas

Question

I'm working with a data set containing information on a phenomenon occurring during some time frames. I am given the start and end time of the event and its severity, as well as some other information. I would like to expand these frames over some larger time period by expanding the rows within set time periods and leaving the rest of the information as NaNs.

Data set example:

                         date_end         severity   category
     date_start           
2018-01-04 07:00:00  2018-01-04 10:00:00     12          1
2018-01-04 12:00:00  2018-01-04 13:00:00     44          2

What I want is:

                     severity   category
     date_start           
2018-01-04 07:00:00     12         1
2018-01-04 08:00:00     12         1
2018-01-04 09:00:00     12         1
2018-01-04 10:00:00     12         1
2018-01-04 11:00:00     nan       nan
2018-01-04 12:00:00     44         2
2018-01-04 13:00:00     44         2
2018-01-04 14:00:00     nan       nan
2018-01-04 15:00:00     nan       nan

What would be an efficient way of achieving such a result?

Code Different · Accepted Answer

Assuming you are on pandas v0.25, use explode:

df['hour'] = df.apply(lambda row: pd.date_range(row.name, row['date_end'], freq='H'), axis=1)
df = df.explode('hour').reset_index() \
        .drop(columns=['date_start', 'date_end']) \
        .rename(columns={'hour': 'date_start'}) \
        .set_index('date_start')

For the rows with nan, you may reindex your dataframe.

# Report from Jan 4 - 5, 2018, from 7AM - 7PM
days = pd.date_range('2018-01-04', '2018-01-05')
hours = pd.to_timedelta(range(7, 20), unit='h')
tmp = pd.MultiIndex.from_product([days, hours], names=['Date', 'Hour']).to_frame()

s = tmp['Date'] + tmp['Hour']
df.reindex(s)

Expand rows by date range having start and end in Pandas

Answers (2)

Related Questions