werner
werner

Reputation: 14845

Write previous entries of a time serie into additional columns

I have a dataframe that contains values for individual days:

  day  value
    1   10.1
    2   15.4
    3   12.1
    4   14.1
    5   -9.7
    6    2.0
    8    3.4

There is not necessary a value for each day (day 7 is missing in my example), but there is never more than one value per day.

I want to add additional columns to this dataframe, containing per row the value of the day before, the value of two days ago, the value of three days ago etc. The result would be:

  day  value  value-of-1  value-of-2  value-of-3
    1   10.1         NaN         NaN         NaN
    2   15.4        10.1         NaN         NaN
    3   12.1        15.4        10.1         NaN
    4   14.1        12.1        15.4        10.1
    5   -9.7        14.1        12.1        15.4
    6    2.0        -9.7        14.1        12.1
    8    3.4         NaN         2.0        -9.7

At the moment, I add to the orginal dataframe a column containing the required day and then merge the original dataframe using this new column as join condition. After some reorganizing of the columns, I get my result:

data = [[1, 10.1], [2, 15.4], [3, 12.1], [4, 14.1], [5, -9.7], [6, 2.0], [8, 3.4]]
df = pd.DataFrame(data, columns = ['day', 'value']) 

def add_column_for_prev_day(df, day):
    df[f"day-{day}"] = df["day"] - day
    df = df.merge(df[["day", "value"]], how="left", left_on=f"day-{day}", right_on="day", suffixes=("", "_r")) \
        .drop(["day_r",f"day-{day}"],axis=1) \
        .rename({"value_r": f"value-of-{day}"}, axis=1)
    return df

df = add_column_for_prev_day(df, 1)
df = add_column_for_prev_day(df, 2)
df = add_column_for_prev_day(df, 3)

I wonder if there is a better and faster way to get the same result, especially without having to merge the dataframe over and over again.

A simple shift does not help as there are days without data.

Upvotes: 1

Views: 49

Answers (1)

anky
anky

Reputation: 75080

You can use:

m=df.set_index('day').reindex(range(df['day'].min(),df['day'].max()+1))
l=[1,2,3]
for i in l:
    m[f"value_of_{i}"] = m['value'].shift(i)
m.reindex(df.day).reset_index()

   day  value  value_of_1  value_of_2  value_of_3
0    1   10.1         NaN         NaN         NaN
1    2   15.4        10.1         NaN         NaN
2    3   12.1        15.4        10.1         NaN
3    4   14.1        12.1        15.4        10.1
4    5   -9.7        14.1        12.1        15.4
5    6    2.0        -9.7        14.1        12.1
6    8    3.4         NaN         2.0        -9.7

Upvotes: 1

Related Questions