Reputation: 631
I am struggling with one thing in Pandas. I am not sure how to add a new rows at the beginning of each day, without using for .. in
, what takes a lot of time.
Below is a table, where date_time
is a timestamp
created with to_datetime
. For each day, before 09:00:00+01:00
I need to add a row with 08:00:00+01:00
and copy last value from previous day.
I will be grateful for any help.
date_time value
20437 2022-02-10 09:00:00+01:00 80.80 #<-before this
20438 2022-02-10 10:00:00+01:00 80.20
20439 2022-02-10 11:00:00+01:00 80.20
20440 2022-02-10 12:00:00+01:00 80.70
20441 2022-02-10 13:00:00+01:00 81.10
20442 2022-02-10 14:00:00+01:00 81.00
20443 2022-02-10 15:00:00+01:00 80.15
20444 2022-02-10 16:00:00+01:00 80.25
20445 2022-02-10 17:00:00+01:00 79.50
20446 2022-02-11 09:00:00+01:00 79.50 #<-before this
20447 2022-02-11 10:00:00+01:00 79.00
20448 2022-02-11 11:00:00+01:00 78.70
20449 2022-02-11 12:00:00+01:00 78.90
20450 2022-02-11 13:00:00+01:00 79.00
20451 2022-02-11 14:00:00+01:00 79.05
20452 2022-02-11 15:00:00+01:00 79.40
20453 2022-02-11 16:00:00+01:00 79.85
20454 2022-02-11 17:00:00+01:00 79.50
Upvotes: 1
Views: 71
Reputation: 13458
With the following dataframe:
import pandas as pd
df = pd.DataFrame(
{
"date_time": [
"2022-02-10 09:00:00+01:00",
"2022-02-10 10:00:00+01:00",
"2022-02-10 11:00:00+01:00",
"2022-02-10 12:00:00+01:00",
"2022-02-10 13:00:00+01:00",
"2022-02-10 14:00:00+01:00",
"2022-02-10 15:00:00+01:00",
"2022-02-10 16:00:00+01:00",
"2022-02-10 17:00:00+01:00",
"2022-02-11 09:00:00+01:00",
"2022-02-11 10:00:00+01:00",
"2022-02-11 11:00:00+01:00",
"2022-02-11 12:00:00+01:00",
"2022-02-11 13:00:00+01:00",
"2022-02-11 14:00:00+01:00",
"2022-02-11 15:00:00+01:00",
"2022-02-11 16:00:00+01:00",
"2022-02-11 17:00:00+01:00",
],
"value": [
80.8,
80.2,
80.2,
80.7,
81.1,
81.0,
80.15,
80.25,
79.5,
79.5,
79.0,
78.7,
78.9,
79.0,
79.05,
79.4,
79.85,
79.5,
],
}
)
You could try this:
df["date_time"] = pd.to_datetime(df["date_time"])
new_rows = df.copy()[df["date_time"].dt.hour == 9]
new_rows["date_time"] = new_rows["date_time"].apply(lambda x: x - pd.Timedelta(hours=1))
new_df = pd.concat([df, new_rows]).sort_values(by="date_time", ignore_index=True)
print(new_df)
# Output
date_time value
0 2022-02-10 08:00:00+01:00 80.80
1 2022-02-10 09:00:00+01:00 80.80
2 2022-02-10 10:00:00+01:00 80.20
3 2022-02-10 11:00:00+01:00 80.20
4 2022-02-10 12:00:00+01:00 80.70
5 2022-02-10 13:00:00+01:00 81.10
6 2022-02-10 14:00:00+01:00 81.00
7 2022-02-10 15:00:00+01:00 80.15
8 2022-02-10 16:00:00+01:00 80.25
9 2022-02-10 17:00:00+01:00 79.50
10 2022-02-11 08:00:00+01:00 79.50
11 2022-02-11 09:00:00+01:00 79.50
12 2022-02-11 10:00:00+01:00 79.00
13 2022-02-11 11:00:00+01:00 78.70
14 2022-02-11 12:00:00+01:00 78.90
15 2022-02-11 13:00:00+01:00 79.00
16 2022-02-11 14:00:00+01:00 79.05
17 2022-02-11 15:00:00+01:00 79.40
18 2022-02-11 16:00:00+01:00 79.85
19 2022-02-11 17:00:00+01:00 79.50
Upvotes: 1