Aleks-1and
Aleks-1and

Reputation: 113

Expand rows by date range having start and end in Pandas

I'm working with a data set containing information on a phenomenon occurring during some time frames. I am given the start and end time of the event and its severity, as well as some other information. I would like to expand these frames over some larger time period by expanding the rows within set time periods and leaving the rest of the information as NaNs.

Data set example:

                         date_end         severity   category
     date_start           
2018-01-04 07:00:00  2018-01-04 10:00:00     12          1
2018-01-04 12:00:00  2018-01-04 13:00:00     44          2

What I want is:

                     severity   category
     date_start           
2018-01-04 07:00:00     12         1
2018-01-04 08:00:00     12         1
2018-01-04 09:00:00     12         1
2018-01-04 10:00:00     12         1
2018-01-04 11:00:00     nan       nan
2018-01-04 12:00:00     44         2
2018-01-04 13:00:00     44         2
2018-01-04 14:00:00     nan       nan
2018-01-04 15:00:00     nan       nan

What would be an efficient way of achieving such a result?

Upvotes: 11

Views: 9632

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

One way to do this is to reindex the datafame using pd.date_range, the use ffill and mask the values where the index is greater than date_end.

df.index = pd.to_datetime(df.index)

df['date_end'] = pd.to_datetime(df['date_end'])

df1 = df.reindex(pd.date_range(df.index.min(), '2018-01-04 15:00:00', freq='H'))

df1 = df1.ffill()

df1.loc[(df1.index - df1['date_end']) > pd.Timedelta(days=0)] = np.nan

df_out = df1.drop('date_end', axis=1)

print(df_out)

Output:

                     severity  category
2018-01-04 07:00:00      12.0       1.0
2018-01-04 08:00:00      12.0       1.0
2018-01-04 09:00:00      12.0       1.0
2018-01-04 10:00:00      12.0       1.0
2018-01-04 11:00:00       NaN       NaN
2018-01-04 12:00:00      44.0       2.0
2018-01-04 13:00:00      44.0       2.0
2018-01-04 14:00:00       NaN       NaN
2018-01-04 15:00:00       NaN       NaN

Upvotes: 3

Code Different
Code Different

Reputation: 93141

Assuming you are on pandas v0.25, use explode:

df['hour'] = df.apply(lambda row: pd.date_range(row.name, row['date_end'], freq='H'), axis=1)
df = df.explode('hour').reset_index() \
        .drop(columns=['date_start', 'date_end']) \
        .rename(columns={'hour': 'date_start'}) \
        .set_index('date_start')

For the rows with nan, you may reindex your dataframe.

# Report from Jan 4 - 5, 2018, from 7AM - 7PM
days = pd.date_range('2018-01-04', '2018-01-05')
hours = pd.to_timedelta(range(7, 20), unit='h')
tmp = pd.MultiIndex.from_product([days, hours], names=['Date', 'Hour']).to_frame()

s = tmp['Date'] + tmp['Hour']
df.reindex(s)

Upvotes: 11

Related Questions