bakunet
bakunet

Reputation: 631

How to insert one row before each day, with value based on last row of previous day?

I am struggling with one thing in Pandas. I am not sure how to add a new rows at the beginning of each day, without using for .. in, what takes a lot of time.

Below is a table, where date_time is a timestamp created with to_datetime. For each day, before 09:00:00+01:00 I need to add a row with 08:00:00+01:00 and copy last value from previous day.

I will be grateful for any help.

                      date_time  value
20437 2022-02-10 09:00:00+01:00  80.80 #<-before this
20438 2022-02-10 10:00:00+01:00  80.20
20439 2022-02-10 11:00:00+01:00  80.20
20440 2022-02-10 12:00:00+01:00  80.70
20441 2022-02-10 13:00:00+01:00  81.10
20442 2022-02-10 14:00:00+01:00  81.00
20443 2022-02-10 15:00:00+01:00  80.15
20444 2022-02-10 16:00:00+01:00  80.25
20445 2022-02-10 17:00:00+01:00  79.50
20446 2022-02-11 09:00:00+01:00  79.50 #<-before this
20447 2022-02-11 10:00:00+01:00  79.00
20448 2022-02-11 11:00:00+01:00  78.70
20449 2022-02-11 12:00:00+01:00  78.90
20450 2022-02-11 13:00:00+01:00  79.00
20451 2022-02-11 14:00:00+01:00  79.05
20452 2022-02-11 15:00:00+01:00  79.40
20453 2022-02-11 16:00:00+01:00  79.85
20454 2022-02-11 17:00:00+01:00  79.50

Upvotes: 1

Views: 71

Answers (1)

Laurent
Laurent

Reputation: 13458

With the following dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "date_time": [
            "2022-02-10 09:00:00+01:00",
            "2022-02-10 10:00:00+01:00",
            "2022-02-10 11:00:00+01:00",
            "2022-02-10 12:00:00+01:00",
            "2022-02-10 13:00:00+01:00",
            "2022-02-10 14:00:00+01:00",
            "2022-02-10 15:00:00+01:00",
            "2022-02-10 16:00:00+01:00",
            "2022-02-10 17:00:00+01:00",
            "2022-02-11 09:00:00+01:00",
            "2022-02-11 10:00:00+01:00",
            "2022-02-11 11:00:00+01:00",
            "2022-02-11 12:00:00+01:00",
            "2022-02-11 13:00:00+01:00",
            "2022-02-11 14:00:00+01:00",
            "2022-02-11 15:00:00+01:00",
            "2022-02-11 16:00:00+01:00",
            "2022-02-11 17:00:00+01:00",
        ],
        "value": [
            80.8,
            80.2,
            80.2,
            80.7,
            81.1,
            81.0,
            80.15,
            80.25,
            79.5,
            79.5,
            79.0,
            78.7,
            78.9,
            79.0,
            79.05,
            79.4,
            79.85,
            79.5,
        ],
    }
)

You could try this:

df["date_time"] = pd.to_datetime(df["date_time"])

new_rows = df.copy()[df["date_time"].dt.hour == 9]

new_rows["date_time"] = new_rows["date_time"].apply(lambda x: x - pd.Timedelta(hours=1))

new_df = pd.concat([df, new_rows]).sort_values(by="date_time", ignore_index=True)

print(new_df)
# Output
                   date_time  value
0  2022-02-10 08:00:00+01:00  80.80
1  2022-02-10 09:00:00+01:00  80.80
2  2022-02-10 10:00:00+01:00  80.20
3  2022-02-10 11:00:00+01:00  80.20
4  2022-02-10 12:00:00+01:00  80.70
5  2022-02-10 13:00:00+01:00  81.10
6  2022-02-10 14:00:00+01:00  81.00
7  2022-02-10 15:00:00+01:00  80.15
8  2022-02-10 16:00:00+01:00  80.25
9  2022-02-10 17:00:00+01:00  79.50
10 2022-02-11 08:00:00+01:00  79.50
11 2022-02-11 09:00:00+01:00  79.50
12 2022-02-11 10:00:00+01:00  79.00
13 2022-02-11 11:00:00+01:00  78.70
14 2022-02-11 12:00:00+01:00  78.90
15 2022-02-11 13:00:00+01:00  79.00
16 2022-02-11 14:00:00+01:00  79.05
17 2022-02-11 15:00:00+01:00  79.40
18 2022-02-11 16:00:00+01:00  79.85
19 2022-02-11 17:00:00+01:00  79.50

Upvotes: 1

Related Questions