Recessive
Recessive

Reputation: 1939

Pandas how to shift column by datetime into datetime not in index

I want to shift a pandas column by an amount of time, and reindex the dataframe to accommodate this shift. Take the following dataframe:

df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
                   "Col2": [13, 23, 18, 33, 48],
                   "Col3": [17, 27, 22, 37, 52]},
                  index=pd.date_range("11:00", "13:00", freq="30min"))

I would like to shift Col1 by 15 minutes and have the datetime index of the dataframe be updated to allow for these new values. However, if I shift Col1 by 15 minutes, you can see that because it doesn't align with the index, the entire column is just set to NaN values:

df["Col1"] = df["Col1"].shift(15, freq="T")
print(df)


                      Col1  Col2  Col3
2021-03-25 11:00:00   NaN    13    17
2021-03-25 11:30:00   NaN    23    27
2021-03-25 12:00:00   NaN    18    22
2021-03-25 12:30:00   NaN    33    37
2021-03-25 13:00:00   NaN    48    52

I would like the dataframe to look like this:

                     Col1  Col2  Col3
2021-03-25 11:00:00   NaN  13.0  17.0
2021-03-25 11:15:00  10.0   NaN   NaN
2021-03-25 11:30:00   NaN  23.0  27.0
2021-03-25 11:45:00  20.0   NaN   NaN
2021-03-25 12:00:00   NaN  18.0  22.0
2021-03-25 12:15:00  15.0   NaN   NaN
2021-03-25 12:30:00   NaN  33.0  37.0
2021-03-25 12:45:00  30.0   NaN   NaN
2021-03-25 13:00:00   NaN  48.0  52.0
2021-03-25 13:15:00  45.0   NaN   NaN

(which I created with the following code:)

df = pd.DataFrame({"Col1": [float('nan'), 10, float('nan'), 20, float('nan'), 15, float('nan'), 30, float('nan'), 45],
                   "Col2": [13, float('nan'),  23, float('nan'), 18, float('nan'), 33, float('nan'), 48, float('nan')],
                   "Col3": [17, float('nan'), 27, float('nan'), 22, float('nan'), 37, float('nan'), 52, float('nan')]},
                  index=pd.date_range("11:00", "13:15", freq="15min"))

If there are any suggestions for this that would be greatly appreciated!

Upvotes: 2

Views: 1078

Answers (3)

Recessive
Recessive

Reputation: 1939

BENY's answer works, but I found to be slow with my very large dataframe. As such, I did the following and it worked for me and was much quicker:

dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="15min")
df = df.reindex(dt_index2)
df["Col1"] = df["Col1"].shift(15, freq="T")
print(df)

                     Col1  Col2  Col3
2021-03-25 11:00:00   NaN  13.0  17.0
2021-03-25 11:15:00  10.0   NaN   NaN
2021-03-25 11:30:00   NaN  23.0  27.0
2021-03-25 11:45:00  20.0   NaN   NaN
2021-03-25 12:00:00   NaN  18.0  22.0
2021-03-25 12:15:00  15.0   NaN   NaN
2021-03-25 12:30:00   NaN  33.0  37.0
2021-03-25 12:45:00  30.0   NaN   NaN
2021-03-25 13:00:00   NaN  48.0  52.0

EDIT:

The reason I wanted to do this task is because each column needed to be offset by its index (so Col1 by 1 second, Col2 by 2 seconds etc) and then all joined into one column. For this purpose, BENY's answer is better when used in conjunction with mine, as it keeps memory usage to a minimum. In this case, you should do the following:

dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="S")
df2 = pd.DataFrame(columns=["concentration"], index=dt_index2)
df2["concentration"] = df2["concentration"].add(df.pop("Col1").shift(1, freq="S"), fill_value=0)
df2["concentration"] = df2["concentration"].add(df.pop("Col2").shift(2, freq="S"), fill_value=0)
df2["concentration"] = df2["concentration"].add(df.pop("Col3").shift(3, freq="S"), fill_value=0)

Using this ensures you only have a single column with a dense index, where as if you reindex the other dataframe, you end up with 3 columns.

Upvotes: 1

user2317421
user2317421

Reputation:

Another option is to use resample:

df = df.resample("15T").first()

                     Col1  Col2  Col3
2021-03-24 11:00:00  10.0  13.0  17.0
2021-03-24 11:15:00   NaN   NaN   NaN
2021-03-24 11:30:00  20.0  23.0  27.0
2021-03-24 11:45:00   NaN   NaN   NaN
2021-03-24 12:00:00  15.0  18.0  22.0
2021-03-24 12:15:00   NaN   NaN   NaN
2021-03-24 12:30:00  30.0  33.0  37.0
2021-03-24 12:45:00   NaN   NaN   NaN
2021-03-24 13:00:00  45.0  48.0  52.0

Then, you can simply shift Col1:

df.Col1 = df.Col1.shift(1)

                     Col1  Col2  Col3
2021-03-24 11:00:00   NaN  13.0  17.0
2021-03-24 11:15:00  10.0   NaN   NaN
2021-03-24 11:30:00   NaN  23.0  27.0
2021-03-24 11:45:00  20.0   NaN   NaN
2021-03-24 12:00:00   NaN  18.0  22.0
2021-03-24 12:15:00  15.0   NaN   NaN
2021-03-24 12:30:00   NaN  33.0  37.0
2021-03-24 12:45:00  30.0   NaN   NaN
2021-03-24 13:00:00   NaN  48.0  52.0

EDIT: This seems to be comparable in speed with @Recessive's answer:

def resampling(df):
    df = df.resample("15T").first()
    df.Col1 = df.Col1.shift(1)

def reindexing(df):
    dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="15min")
    df = df.reindex(dt_index2)
    df["Col1"] = df["Col1"].shift(15, freq="T")

%timeit resampling(df)
1.37 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit reindexing(df)
1.11 ms ± 31.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 1

BENY
BENY

Reputation: 323226

Check with concat

out = pd.concat([df.pop("Col1").shift(15, freq="T"),df],axis=1)
Out[478]: 
                     Col1  Col2  Col3
2021-03-24 11:00:00   NaN  13.0  17.0
2021-03-24 11:15:00  10.0   NaN   NaN
2021-03-24 11:30:00   NaN  23.0  27.0
2021-03-24 11:45:00  20.0   NaN   NaN
2021-03-24 12:00:00   NaN  18.0  22.0
2021-03-24 12:15:00  15.0   NaN   NaN
2021-03-24 12:30:00   NaN  33.0  37.0
2021-03-24 12:45:00  30.0   NaN   NaN
2021-03-24 13:00:00   NaN  48.0  52.0
2021-03-24 13:15:00  45.0   NaN   NaN

Upvotes: 1

Related Questions