Reputation: 186
I have a dataframe with a column named DateTime with datetime values populated every 5 seconds. But few rows are missing which can be identified by seeing time difference between previous and current row. I want to insert the missing rows and populate other column with previous row values.
My Sample dataframe is like below:
DateTime Price
2022-03-04 09:15:00 34526.00
2022-03-04 09:15:05 34487.00
2022-03-04 09:15:10 34470.00
2022-03-04 09:15:20 34466.00
2022-03-04 09:15:45 34448.00
Result dataframe as below:
DateTime Price
2022-03-04 09:15:00 34526.00
2022-03-04 09:15:05 34487.00
2022-03-04 09:15:10 34470.00
2022-03-04 09:15:15 34470.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:20 34466.00
2022-03-04 09:15:25 34466.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:30 34466.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:35 34466.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:40 34466.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:45 34448.00
Upvotes: 1
Views: 672
Reputation: 28644
pandas asfreq method suffices for this :
(df
.set_index("DateTime")
.asfreq(freq="5S", method="ffill")
.reset_index()
)
DateTime Price
0 2022-03-04 09:15:00 34526.0
1 2022-03-04 09:15:05 34487.0
2 2022-03-04 09:15:10 34470.0
3 2022-03-04 09:15:15 34470.0
4 2022-03-04 09:15:20 34466.0
5 2022-03-04 09:15:25 34466.0
6 2022-03-04 09:15:30 34466.0
7 2022-03-04 09:15:35 34466.0
8 2022-03-04 09:15:40 34466.0
9 2022-03-04 09:15:45 34448.0
Upvotes: 2
Reputation: 8962
An alternative, using an outer join:
t = pd.date_range(df.DateTime.min(), df.DateTime.max(), freq="5s", name="DateTime")
pd.merge(pd.DataFrame(t), df, how="outer").ffill()
Output:
Out[3]:
DateTime Price
0 2022-03-04 09:15:00 34526.0
1 2022-03-04 09:15:05 34487.0
2 2022-03-04 09:15:10 34470.0
3 2022-03-04 09:15:15 34470.0
4 2022-03-04 09:15:20 34466.0
5 2022-03-04 09:15:25 34466.0
6 2022-03-04 09:15:30 34466.0
7 2022-03-04 09:15:35 34466.0
8 2022-03-04 09:15:40 34466.0
9 2022-03-04 09:15:45 34448.0
Upvotes: 3
Reputation: 153460
Try resample
then ffill
:
df['DateTime'] = pd.to_datetime(df['DateTime']) # change to datetime dtype
df = df.set_index('DateTime') # move DateTime into index
df_out = df.resample('5S').ffill() # resample 5 secs and forward fill
Output:
Price
DateTime
2022-03-04 09:15:00 34526.0
2022-03-04 09:15:05 34487.0
2022-03-04 09:15:10 34470.0
2022-03-04 09:15:15 34470.0
2022-03-04 09:15:20 34466.0
2022-03-04 09:15:25 34466.0
2022-03-04 09:15:30 34466.0
2022-03-04 09:15:35 34466.0
2022-03-04 09:15:40 34466.0
2022-03-04 09:15:45 34448.0
Upvotes: 3
Reputation: 3852
Another option:
Create a new dataframe with the range of dates you want
df_2 = pd.DataFrame({
"DateTime": pd.date_range(start=df.loc[0, "DateTime"], end=df.loc[len(df.index)-1, "DateTime"], freq="5s")
})
Merge the new and the original dataframe using outer join
df = pd.merge(df, df_2, how="outer").sort_values("DateTime")
Fill empty values using .fillna(method="ffill")
df.fillna(method="ffill")
Output:
DateTime Price
0 2022-03-04 09:15:00 34526.0
1 2022-03-04 09:15:05 34487.0
2 2022-03-04 09:15:10 34470.0
5 2022-03-04 09:15:15 34470.0
3 2022-03-04 09:15:20 34466.0
6 2022-03-04 09:15:25 34466.0
7 2022-03-04 09:15:30 34466.0
8 2022-03-04 09:15:35 34466.0
9 2022-03-04 09:15:40 34466.0
4 2022-03-04 09:15:45 34448.0
Resulting code:
df_2 = pd.DataFrame({
"DateTime": pd.date_range(start=df.loc[0, "DateTime"], end=df.loc[len(df.index)-1, "DateTime"], freq="5s")
})
df = pd.merge(df, df_2, how="outer").sort_values("DateTime")
df = df.fillna(method="ffill")
print(df)
Upvotes: 0