Reputation: 1832
I have a dataframe that is missing time indexes of data that I want to upsample to a 15 minute interval and maintain the NAN in the upsampled points. Any idea how to do this? The idea is to build an empty dataframe with the correct timeseries indexes and then fill them with the good values. Here's a toy data set for an example:
import pandas as pd
df_data = '{"A":{"1503227700000":0.3,"1503228600000":0.64,"1503232200000":0.54,"1503234000000":0.7,"1503237600000":0.78},"B":{"1503227700000":0.46,"1503228600000":0.52,"1503232200000":0.54,"1503234000000":0.58,"1503237600000":0.58},"C":{"1503227700000":0.04,"1503228600000":0.04,"1503232200000":0.0,"1503234000000":0.1,"1503237600000":0.0},"D":{"1503227700000":0.0,"1503228600000":0.0,"1503232200000":0.0,"1503234000000":0.0,"1503237600000":0.0}}'
df_left
Out[1]:
A B C D
2017-08-20 11:15:00 0.30 0.46 0.04 0
2017-08-20 11:30:00 0.64 0.52 0.04 0
2017-08-20 12:30:00 0.54 0.54 0.00 0
2017-08-20 13:00:00 0.70 0.58 0.10 0
2017-08-20 14:00:00 0.78 0.58 0.00 0
time_ref = pd.date_range(start='2017-08-20 11:15:00', end='2017-08-20 14:00:00', freq='15min')
df_right = pd.DataFrame(index=time_ref, columns=['A','B','C','D'])
df_right
Out[54]:
A B C D
2017-08-20 11:15:00 NaN NaN NaN NaN
2017-08-20 11:30:00 NaN NaN NaN NaN
2017-08-20 11:45:00 NaN NaN NaN NaN
2017-08-20 12:00:00 NaN NaN NaN NaN
2017-08-20 12:15:00 NaN NaN NaN NaN
2017-08-20 12:30:00 NaN NaN NaN NaN
2017-08-20 12:45:00 NaN NaN NaN NaN
2017-08-20 13:00:00 NaN NaN NaN NaN
2017-08-20 13:15:00 NaN NaN NaN NaN
2017-08-20 13:30:00 NaN NaN NaN NaN
2017-08-20 13:45:00 NaN NaN NaN NaN
2017-08-20 14:00:00 NaN NaN NaN NaN
I want to join these so that I have a data frame like this:
df_data_join = '{"A": {"1503227700000": 0.30,"1503228600000": 0.64,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0.54,"1503233100000": null,"1503234000000": 0.70,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0.78},"B": {"1503227700000": 0.46,"1503228600000": 0.52,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0.54,"1503233100000": null,"1503234000000": 0.58,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0.58},"C": {"1503227700000": 0.04,"1503228600000": 0.04,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0.00,"1503233100000": null,"1503234000000": 0.10,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0.00},"D": {"1503227700000": 0,"1503228600000": 0,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0,"1503233100000": null,"1503234000000": 0,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0}}'
df_join = pd.read_json(df_data_join)
df_join
Out[10]:
A B C D
2017-08-20 11:15:00 0.30 0.46 0.04 0.0
2017-08-20 11:30:00 0.64 0.52 0.04 0.0
2017-08-20 11:45:00 NaN NaN NaN NaN
2017-08-20 12:00:00 NaN NaN NaN NaN
2017-08-20 12:15:00 NaN NaN NaN NaN
2017-08-20 12:30:00 0.54 0.54 0.00 0.0
2017-08-20 12:45:00 NaN NaN NaN NaN
2017-08-20 13:00:00 0.70 0.58 0.10 0.0
2017-08-20 13:15:00 NaN NaN NaN NaN
2017-08-20 13:30:00 NaN NaN NaN NaN
2017-08-20 13:45:00 NaN NaN NaN NaN
2017-08-20 14:00:00 0.78 0.58 0.00 0.0
I've tried merge, concat, replace, interpolate, upsample and but none get me there. I need the NaN in the dataframe so I can identify where the data is not fill in later processing.
Any help would be greatly appreciated.
Upvotes: 0
Views: 33
Reputation: 29635
you can use resample
with asfreq
directly on the data with missing time indexes such as:
print (pd.read_json(df_data).resample('15T').asfreq())
A B C D
2017-08-20 11:15:00 0.30 0.46 0.04 0.0
2017-08-20 11:30:00 0.64 0.52 0.04 0.0
2017-08-20 11:45:00 NaN NaN NaN NaN
2017-08-20 12:00:00 NaN NaN NaN NaN
2017-08-20 12:15:00 NaN NaN NaN NaN
2017-08-20 12:30:00 0.54 0.54 0.00 0.0
2017-08-20 12:45:00 NaN NaN NaN NaN
2017-08-20 13:00:00 0.70 0.58 0.10 0.0
2017-08-20 13:15:00 NaN NaN NaN NaN
2017-08-20 13:30:00 NaN NaN NaN NaN
2017-08-20 13:45:00 NaN NaN NaN NaN
2017-08-20 14:00:00 0.78 0.58 0.00 0.0
If you absolutely want to work with df_right
, then using fillna
with df_left
would do the same df_right.fillna(df_left)
Upvotes: 1