How do I join 2 DataFrames in Python and preserve the NANs in the result?

Question

I have a dataframe that is missing time indexes of data that I want to upsample to a 15 minute interval and maintain the NAN in the upsampled points. Any idea how to do this? The idea is to build an empty dataframe with the correct timeseries indexes and then fill them with the good values. Here's a toy data set for an example:

import pandas as pd
df_data = '{"A":{"1503227700000":0.3,"1503228600000":0.64,"1503232200000":0.54,"1503234000000":0.7,"1503237600000":0.78},"B":{"1503227700000":0.46,"1503228600000":0.52,"1503232200000":0.54,"1503234000000":0.58,"1503237600000":0.58},"C":{"1503227700000":0.04,"1503228600000":0.04,"1503232200000":0.0,"1503234000000":0.1,"1503237600000":0.0},"D":{"1503227700000":0.0,"1503228600000":0.0,"1503232200000":0.0,"1503234000000":0.0,"1503237600000":0.0}}'

df_left
Out[1]: 
                        A     B     C  D
2017-08-20 11:15:00  0.30  0.46  0.04  0
2017-08-20 11:30:00  0.64  0.52  0.04  0
2017-08-20 12:30:00  0.54  0.54  0.00  0
2017-08-20 13:00:00  0.70  0.58  0.10  0
2017-08-20 14:00:00  0.78  0.58  0.00  0

time_ref = pd.date_range(start='2017-08-20 11:15:00', end='2017-08-20 14:00:00', freq='15min')
df_right = pd.DataFrame(index=time_ref, columns=['A','B','C','D'])

df_right
Out[54]: 
                       A    B    C    D
2017-08-20 11:15:00  NaN  NaN  NaN  NaN
2017-08-20 11:30:00  NaN  NaN  NaN  NaN
2017-08-20 11:45:00  NaN  NaN  NaN  NaN
2017-08-20 12:00:00  NaN  NaN  NaN  NaN
2017-08-20 12:15:00  NaN  NaN  NaN  NaN
2017-08-20 12:30:00  NaN  NaN  NaN  NaN
2017-08-20 12:45:00  NaN  NaN  NaN  NaN
2017-08-20 13:00:00  NaN  NaN  NaN  NaN
2017-08-20 13:15:00  NaN  NaN  NaN  NaN
2017-08-20 13:30:00  NaN  NaN  NaN  NaN
2017-08-20 13:45:00  NaN  NaN  NaN  NaN
2017-08-20 14:00:00  NaN  NaN  NaN  NaN

I want to join these so that I have a data frame like this:

df_data_join =  '{"A": {"1503227700000": 0.30,"1503228600000": 0.64,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0.54,"1503233100000": null,"1503234000000": 0.70,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0.78},"B": {"1503227700000": 0.46,"1503228600000": 0.52,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0.54,"1503233100000": null,"1503234000000": 0.58,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0.58},"C": {"1503227700000": 0.04,"1503228600000": 0.04,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0.00,"1503233100000": null,"1503234000000": 0.10,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0.00},"D": {"1503227700000": 0,"1503228600000": 0,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0,"1503233100000": null,"1503234000000": 0,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0}}'


df_join = pd.read_json(df_data_join)

df_join
Out[10]: 
                        A     B     C    D
2017-08-20 11:15:00  0.30  0.46  0.04  0.0
2017-08-20 11:30:00  0.64  0.52  0.04  0.0
2017-08-20 11:45:00   NaN   NaN   NaN  NaN
2017-08-20 12:00:00   NaN   NaN   NaN  NaN
2017-08-20 12:15:00   NaN   NaN   NaN  NaN
2017-08-20 12:30:00  0.54  0.54  0.00  0.0
2017-08-20 12:45:00   NaN   NaN   NaN  NaN
2017-08-20 13:00:00  0.70  0.58  0.10  0.0
2017-08-20 13:15:00   NaN   NaN   NaN  NaN
2017-08-20 13:30:00   NaN   NaN   NaN  NaN
2017-08-20 13:45:00   NaN   NaN   NaN  NaN
2017-08-20 14:00:00  0.78  0.58  0.00  0.0

I've tried merge, concat, replace, interpolate, upsample and but none get me there. I need the NaN in the dataframe so I can identify where the data is not fill in later processing.

Any help would be greatly appreciated.

How do I join 2 DataFrames in Python and preserve the NANs in the result?

Answers (1)

Related Questions