T. Corrie
T. Corrie

Reputation: 13

Adding empty dataframe rows based on missing datetime values

I am trying to add rows to my pandas dataframe as such:

import pandas as pd
import datetime as dt

d={'datetime':[dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
  'value':[4.,5.,1.]}

df=pd.DataFrame(d)

Which outputs:

             datetime  value
0 2018-03-01 00:00:00    4.0
1 2018-03-01 00:10:00    5.0
2 2018-03-01 00:40:00    1.0

What I want to do is add rows from 00:00:00 to 00:40:00, to show every 5 minutes. My desired output looks like this:

             datetime  value
0 2018-03-01 00:00:00    4.0
1 2018-03-01 00:05:00    NaN
2 2018-03-01 00:10:00    5.0
3 2018-03-01 00:15:00    NaN
4 2018-03-01 00:20:00    NaN
5 2018-03-01 00:25:00    NaN
6 2018-03-01 00:30:00    NaN
7 2018-03-01 00:35:00    NaN
8 2018-03-01 00:40:00    1.0

How do I get there?

Upvotes: 0

Views: 2215

Answers (2)

Boubacar Traoré
Boubacar Traoré

Reputation: 359

First, you can create a dataframe including your final datetime index and then affect the second one :

df1 = pd.DataFrame({'value': np.nan} ,index=pd.date_range('2018-03-01 00:00:00', 
                     periods=9, freq='5min'))

print(df)
#Output :
                   value
2018-03-01 00:00:00 NaN
2018-03-01 00:05:00 NaN
2018-03-01 00:10:00 NaN
2018-03-01 00:15:00 NaN
2018-03-01 00:20:00 NaN
2018-03-01 00:25:00 NaN
2018-03-01 00:30:00 NaN
2018-03-01 00:35:00 NaN
2018-03-01 00:40:00 NaN

Now, let's say your dataframe is the second one, you can add this to your above code :

d={'datetime': 
[dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
'value':[4.,5.,1.]}

df2=pd.DataFrame(d)
df2.datetime = pd.to_datetime(df2.datetime)
df2.set_index('datetime',inplace=True)
print(df2)

#Output
                   value
datetime    
2018-03-01 00:00:00 4.0
2018-03-01 00:10:00 5.0
2018-03-01 00:40:00 1.0

Finally :

df1.value = df2.value
print(df1)

#output
                   value
2018-03-01 00:00:00 4.0
2018-03-01 00:05:00 NaN
2018-03-01 00:10:00 5.0
2018-03-01 00:15:00 NaN
2018-03-01 00:20:00 NaN
2018-03-01 00:25:00 NaN
2018-03-01 00:30:00 NaN
2018-03-01 00:35:00 NaN
2018-03-01 00:40:00 1.0

Upvotes: 0

jpp
jpp

Reputation: 164623

You can use pd.DataFrame.resample:

df = df.resample('5Min', on='datetime').first()\
       .drop('datetime', 1).reset_index()

print(df)

             datetime  value
0 2018-03-01 00:00:00    4.0
1 2018-03-01 00:05:00    NaN
2 2018-03-01 00:10:00    5.0
3 2018-03-01 00:15:00    NaN
4 2018-03-01 00:20:00    NaN
5 2018-03-01 00:25:00    NaN
6 2018-03-01 00:30:00    NaN
7 2018-03-01 00:35:00    NaN
8 2018-03-01 00:40:00    1.0

Upvotes: 2

Related Questions