Reputation: 14845
I have a dataframe that contains values for individual days:
day value
1 10.1
2 15.4
3 12.1
4 14.1
5 -9.7
6 2.0
8 3.4
There is not necessary a value for each day (day 7 is missing in my example), but there is never more than one value per day.
I want to add additional columns to this dataframe, containing per row the value of the day before, the value of two days ago, the value of three days ago etc. The result would be:
day value value-of-1 value-of-2 value-of-3
1 10.1 NaN NaN NaN
2 15.4 10.1 NaN NaN
3 12.1 15.4 10.1 NaN
4 14.1 12.1 15.4 10.1
5 -9.7 14.1 12.1 15.4
6 2.0 -9.7 14.1 12.1
8 3.4 NaN 2.0 -9.7
At the moment, I add to the orginal dataframe a column containing the required day and then merge the original dataframe using this new column as join condition. After some reorganizing of the columns, I get my result:
data = [[1, 10.1], [2, 15.4], [3, 12.1], [4, 14.1], [5, -9.7], [6, 2.0], [8, 3.4]]
df = pd.DataFrame(data, columns = ['day', 'value'])
def add_column_for_prev_day(df, day):
df[f"day-{day}"] = df["day"] - day
df = df.merge(df[["day", "value"]], how="left", left_on=f"day-{day}", right_on="day", suffixes=("", "_r")) \
.drop(["day_r",f"day-{day}"],axis=1) \
.rename({"value_r": f"value-of-{day}"}, axis=1)
return df
df = add_column_for_prev_day(df, 1)
df = add_column_for_prev_day(df, 2)
df = add_column_for_prev_day(df, 3)
I wonder if there is a better and faster way to get the same result, especially without having to merge the dataframe over and over again.
A simple shift does not help as there are days without data.
Upvotes: 1
Views: 49
Reputation: 75080
You can use:
m=df.set_index('day').reindex(range(df['day'].min(),df['day'].max()+1))
l=[1,2,3]
for i in l:
m[f"value_of_{i}"] = m['value'].shift(i)
m.reindex(df.day).reset_index()
day value value_of_1 value_of_2 value_of_3
0 1 10.1 NaN NaN NaN
1 2 15.4 10.1 NaN NaN
2 3 12.1 15.4 10.1 NaN
3 4 14.1 12.1 15.4 10.1
4 5 -9.7 14.1 12.1 15.4
5 6 2.0 -9.7 14.1 12.1
6 8 3.4 NaN 2.0 -9.7
Upvotes: 1