Shifting datetime index in selected rows of Pandas DataFrame

Question

I have a Pandas DataFrame that contains half hourly data. I want to shift a small range of datetime indices by 1 hour:

import numpy as np
import pandas as pd
from datetime import datetime, date, timedelta

df = pd.DataFrame()
df['datetime'] = pd.date_range(start='2016-01-01', end='2019-01-01', freq='30T')
df = df.set_index('datetime')

print(df.index)

# shift 1 hour forward for custom date range
date1 = '2016-01-01 00:00:00'
date2 = '2016-03-02 23:49:50'
df[(df.index>=date1) & (df.index<=date2)].index += pd.Timedelta('1H')
#df.index = df[(df.index>=date1) & (df.index<=date2)].index.shift(periods=1,freq='H')

print(df.index)

I tried some different strategies (which don't work) but I want to know if there is a more compact way to do this. Note that this MWE has freq=30T but my actual data has freq=None.

Henry Ecker · Accepted Answer

The expression df[mask] generally produces a weakly linked copy of the original DataFrame containing only the filtered values. Meaning the expression df[mask].index creates a copy, += pd.Timedelta('1H') updates the index of that copy and then throws the new DataFrame away since there is no variable reference.

One approach here would be to use values to update the underlying values (since the "Index does not support mutable operations"):

# shift 1 hour forward for custom date range
date1 = '2016-01-01 00:00:00'
date2 = '2016-03-02 23:49:50'
df.index.values[(date1 <= df.index) & (df.index <= date2)] += pd.Timedelta('1H')

df:

                         0
datetime                  
2016-01-01 01:00:00      0  # Now starts at 01:00:00
2016-01-01 01:30:00      1
2016-01-01 02:00:00      2
2016-01-01 02:30:00      3
2016-01-01 03:00:00      4

In terms of a more concise method there aren't really any options when working with the index.

np.where works but is not necessarily more concise:

date1 = '2016-01-01 00:00:00'
date2 = '2016-03-02 23:49:50'
df.index = np.where((date1 <= df.index) & (df.index <= date2),
                    df.index + pd.Timedelta('1H'),
                    df.index)
df.index.name = 'datetime'  # add index name back if needed

Converting to_series then using between is also an option, but requires making a complete copy of the index which is likely unnecessary:

date1 = '2016-01-01 00:00:00'
date2 = '2016-03-02 23:49:50'
df.index.values[
    df.index.to_series().between(date1, date2)
] += pd.Timedelta('1H')

Index.shift would also work if using the mask on both sides of the equation, but this requires masking the index twice:

date1 = '2016-01-01 00:00:00'
date2 = '2016-03-02 23:49:50'
m = (date1 <= df.index) & (df.index <= date2)
df.index.values[m] = df.index[m].shift(freq='1H')

Shifting datetime index in selected rows of Pandas DataFrame

Answers (2)

Related Questions