Reputation: 13
I am trying to add rows to my pandas dataframe as such:
import pandas as pd
import datetime as dt
d={'datetime':[dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
'value':[4.,5.,1.]}
df=pd.DataFrame(d)
Which outputs:
datetime value
0 2018-03-01 00:00:00 4.0
1 2018-03-01 00:10:00 5.0
2 2018-03-01 00:40:00 1.0
What I want to do is add rows from 00:00:00 to 00:40:00, to show every 5 minutes. My desired output looks like this:
datetime value
0 2018-03-01 00:00:00 4.0
1 2018-03-01 00:05:00 NaN
2 2018-03-01 00:10:00 5.0
3 2018-03-01 00:15:00 NaN
4 2018-03-01 00:20:00 NaN
5 2018-03-01 00:25:00 NaN
6 2018-03-01 00:30:00 NaN
7 2018-03-01 00:35:00 NaN
8 2018-03-01 00:40:00 1.0
How do I get there?
Upvotes: 0
Views: 2215
Reputation: 359
First, you can create a dataframe including your final datetime index and then affect the second one :
df1 = pd.DataFrame({'value': np.nan} ,index=pd.date_range('2018-03-01 00:00:00',
periods=9, freq='5min'))
print(df)
#Output :
value
2018-03-01 00:00:00 NaN
2018-03-01 00:05:00 NaN
2018-03-01 00:10:00 NaN
2018-03-01 00:15:00 NaN
2018-03-01 00:20:00 NaN
2018-03-01 00:25:00 NaN
2018-03-01 00:30:00 NaN
2018-03-01 00:35:00 NaN
2018-03-01 00:40:00 NaN
Now, let's say your dataframe is the second one, you can add this to your above code :
d={'datetime':
[dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
'value':[4.,5.,1.]}
df2=pd.DataFrame(d)
df2.datetime = pd.to_datetime(df2.datetime)
df2.set_index('datetime',inplace=True)
print(df2)
#Output
value
datetime
2018-03-01 00:00:00 4.0
2018-03-01 00:10:00 5.0
2018-03-01 00:40:00 1.0
Finally :
df1.value = df2.value
print(df1)
#output
value
2018-03-01 00:00:00 4.0
2018-03-01 00:05:00 NaN
2018-03-01 00:10:00 5.0
2018-03-01 00:15:00 NaN
2018-03-01 00:20:00 NaN
2018-03-01 00:25:00 NaN
2018-03-01 00:30:00 NaN
2018-03-01 00:35:00 NaN
2018-03-01 00:40:00 1.0
Upvotes: 0
Reputation: 164623
You can use pd.DataFrame.resample
:
df = df.resample('5Min', on='datetime').first()\
.drop('datetime', 1).reset_index()
print(df)
datetime value
0 2018-03-01 00:00:00 4.0
1 2018-03-01 00:05:00 NaN
2 2018-03-01 00:10:00 5.0
3 2018-03-01 00:15:00 NaN
4 2018-03-01 00:20:00 NaN
5 2018-03-01 00:25:00 NaN
6 2018-03-01 00:30:00 NaN
7 2018-03-01 00:35:00 NaN
8 2018-03-01 00:40:00 1.0
Upvotes: 2