Reputation: 1939
I want to shift a pandas column by an amount of time, and reindex the dataframe to accommodate this shift. Take the following dataframe:
df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
"Col2": [13, 23, 18, 33, 48],
"Col3": [17, 27, 22, 37, 52]},
index=pd.date_range("11:00", "13:00", freq="30min"))
I would like to shift Col1
by 15 minutes and have the datetime index of the dataframe be updated to allow for these new values. However, if I shift Col1
by 15 minutes, you can see that because it doesn't align with the index, the entire column is just set to NaN
values:
df["Col1"] = df["Col1"].shift(15, freq="T")
print(df)
Col1 Col2 Col3
2021-03-25 11:00:00 NaN 13 17
2021-03-25 11:30:00 NaN 23 27
2021-03-25 12:00:00 NaN 18 22
2021-03-25 12:30:00 NaN 33 37
2021-03-25 13:00:00 NaN 48 52
I would like the dataframe to look like this:
Col1 Col2 Col3
2021-03-25 11:00:00 NaN 13.0 17.0
2021-03-25 11:15:00 10.0 NaN NaN
2021-03-25 11:30:00 NaN 23.0 27.0
2021-03-25 11:45:00 20.0 NaN NaN
2021-03-25 12:00:00 NaN 18.0 22.0
2021-03-25 12:15:00 15.0 NaN NaN
2021-03-25 12:30:00 NaN 33.0 37.0
2021-03-25 12:45:00 30.0 NaN NaN
2021-03-25 13:00:00 NaN 48.0 52.0
2021-03-25 13:15:00 45.0 NaN NaN
(which I created with the following code:)
df = pd.DataFrame({"Col1": [float('nan'), 10, float('nan'), 20, float('nan'), 15, float('nan'), 30, float('nan'), 45],
"Col2": [13, float('nan'), 23, float('nan'), 18, float('nan'), 33, float('nan'), 48, float('nan')],
"Col3": [17, float('nan'), 27, float('nan'), 22, float('nan'), 37, float('nan'), 52, float('nan')]},
index=pd.date_range("11:00", "13:15", freq="15min"))
If there are any suggestions for this that would be greatly appreciated!
Upvotes: 2
Views: 1078
Reputation: 1939
BENY's answer works, but I found to be slow with my very large dataframe. As such, I did the following and it worked for me and was much quicker:
dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="15min")
df = df.reindex(dt_index2)
df["Col1"] = df["Col1"].shift(15, freq="T")
print(df)
Col1 Col2 Col3
2021-03-25 11:00:00 NaN 13.0 17.0
2021-03-25 11:15:00 10.0 NaN NaN
2021-03-25 11:30:00 NaN 23.0 27.0
2021-03-25 11:45:00 20.0 NaN NaN
2021-03-25 12:00:00 NaN 18.0 22.0
2021-03-25 12:15:00 15.0 NaN NaN
2021-03-25 12:30:00 NaN 33.0 37.0
2021-03-25 12:45:00 30.0 NaN NaN
2021-03-25 13:00:00 NaN 48.0 52.0
EDIT:
The reason I wanted to do this task is because each column needed to be offset by its index (so Col1 by 1 second, Col2 by 2 seconds etc) and then all joined into one column. For this purpose, BENY's answer is better when used in conjunction with mine, as it keeps memory usage to a minimum. In this case, you should do the following:
dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="S")
df2 = pd.DataFrame(columns=["concentration"], index=dt_index2)
df2["concentration"] = df2["concentration"].add(df.pop("Col1").shift(1, freq="S"), fill_value=0)
df2["concentration"] = df2["concentration"].add(df.pop("Col2").shift(2, freq="S"), fill_value=0)
df2["concentration"] = df2["concentration"].add(df.pop("Col3").shift(3, freq="S"), fill_value=0)
Using this ensures you only have a single column with a dense index, where as if you reindex the other dataframe, you end up with 3 columns.
Upvotes: 1
Reputation:
Another option is to use resample
:
df = df.resample("15T").first()
Col1 Col2 Col3
2021-03-24 11:00:00 10.0 13.0 17.0
2021-03-24 11:15:00 NaN NaN NaN
2021-03-24 11:30:00 20.0 23.0 27.0
2021-03-24 11:45:00 NaN NaN NaN
2021-03-24 12:00:00 15.0 18.0 22.0
2021-03-24 12:15:00 NaN NaN NaN
2021-03-24 12:30:00 30.0 33.0 37.0
2021-03-24 12:45:00 NaN NaN NaN
2021-03-24 13:00:00 45.0 48.0 52.0
Then, you can simply shift
Col1:
df.Col1 = df.Col1.shift(1)
Col1 Col2 Col3
2021-03-24 11:00:00 NaN 13.0 17.0
2021-03-24 11:15:00 10.0 NaN NaN
2021-03-24 11:30:00 NaN 23.0 27.0
2021-03-24 11:45:00 20.0 NaN NaN
2021-03-24 12:00:00 NaN 18.0 22.0
2021-03-24 12:15:00 15.0 NaN NaN
2021-03-24 12:30:00 NaN 33.0 37.0
2021-03-24 12:45:00 30.0 NaN NaN
2021-03-24 13:00:00 NaN 48.0 52.0
EDIT: This seems to be comparable in speed with @Recessive's answer:
def resampling(df):
df = df.resample("15T").first()
df.Col1 = df.Col1.shift(1)
def reindexing(df):
dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="15min")
df = df.reindex(dt_index2)
df["Col1"] = df["Col1"].shift(15, freq="T")
%timeit resampling(df)
1.37 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit reindexing(df)
1.11 ms ± 31.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 1
Reputation: 323226
Check with concat
out = pd.concat([df.pop("Col1").shift(15, freq="T"),df],axis=1)
Out[478]:
Col1 Col2 Col3
2021-03-24 11:00:00 NaN 13.0 17.0
2021-03-24 11:15:00 10.0 NaN NaN
2021-03-24 11:30:00 NaN 23.0 27.0
2021-03-24 11:45:00 20.0 NaN NaN
2021-03-24 12:00:00 NaN 18.0 22.0
2021-03-24 12:15:00 15.0 NaN NaN
2021-03-24 12:30:00 NaN 33.0 37.0
2021-03-24 12:45:00 30.0 NaN NaN
2021-03-24 13:00:00 NaN 48.0 52.0
2021-03-24 13:15:00 45.0 NaN NaN
Upvotes: 1