P Moran
P Moran

Reputation: 1832

How do I join 2 DataFrames in Python and preserve the NANs in the result?

I have a dataframe that is missing time indexes of data that I want to upsample to a 15 minute interval and maintain the NAN in the upsampled points. Any idea how to do this? The idea is to build an empty dataframe with the correct timeseries indexes and then fill them with the good values. Here's a toy data set for an example:

import pandas as pd
df_data = '{"A":{"1503227700000":0.3,"1503228600000":0.64,"1503232200000":0.54,"1503234000000":0.7,"1503237600000":0.78},"B":{"1503227700000":0.46,"1503228600000":0.52,"1503232200000":0.54,"1503234000000":0.58,"1503237600000":0.58},"C":{"1503227700000":0.04,"1503228600000":0.04,"1503232200000":0.0,"1503234000000":0.1,"1503237600000":0.0},"D":{"1503227700000":0.0,"1503228600000":0.0,"1503232200000":0.0,"1503234000000":0.0,"1503237600000":0.0}}'

df_left
Out[1]: 
                        A     B     C  D
2017-08-20 11:15:00  0.30  0.46  0.04  0
2017-08-20 11:30:00  0.64  0.52  0.04  0
2017-08-20 12:30:00  0.54  0.54  0.00  0
2017-08-20 13:00:00  0.70  0.58  0.10  0
2017-08-20 14:00:00  0.78  0.58  0.00  0

time_ref = pd.date_range(start='2017-08-20 11:15:00', end='2017-08-20 14:00:00', freq='15min')
df_right = pd.DataFrame(index=time_ref, columns=['A','B','C','D'])

df_right
Out[54]: 
                       A    B    C    D
2017-08-20 11:15:00  NaN  NaN  NaN  NaN
2017-08-20 11:30:00  NaN  NaN  NaN  NaN
2017-08-20 11:45:00  NaN  NaN  NaN  NaN
2017-08-20 12:00:00  NaN  NaN  NaN  NaN
2017-08-20 12:15:00  NaN  NaN  NaN  NaN
2017-08-20 12:30:00  NaN  NaN  NaN  NaN
2017-08-20 12:45:00  NaN  NaN  NaN  NaN
2017-08-20 13:00:00  NaN  NaN  NaN  NaN
2017-08-20 13:15:00  NaN  NaN  NaN  NaN
2017-08-20 13:30:00  NaN  NaN  NaN  NaN
2017-08-20 13:45:00  NaN  NaN  NaN  NaN
2017-08-20 14:00:00  NaN  NaN  NaN  NaN

I want to join these so that I have a data frame like this:

df_data_join =  '{"A": {"1503227700000": 0.30,"1503228600000": 0.64,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0.54,"1503233100000": null,"1503234000000": 0.70,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0.78},"B": {"1503227700000": 0.46,"1503228600000": 0.52,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0.54,"1503233100000": null,"1503234000000": 0.58,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0.58},"C": {"1503227700000": 0.04,"1503228600000": 0.04,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0.00,"1503233100000": null,"1503234000000": 0.10,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0.00},"D": {"1503227700000": 0,"1503228600000": 0,"1503229500000": null,"1503230400000": null,"1503231300000": null,"1503232200000": 0,"1503233100000": null,"1503234000000": 0,"1503234900000": null,"1503235800000": null,"1503236700000": null,"1503237600000": 0}}'


df_join = pd.read_json(df_data_join)

df_join
Out[10]: 
                        A     B     C    D
2017-08-20 11:15:00  0.30  0.46  0.04  0.0
2017-08-20 11:30:00  0.64  0.52  0.04  0.0
2017-08-20 11:45:00   NaN   NaN   NaN  NaN
2017-08-20 12:00:00   NaN   NaN   NaN  NaN
2017-08-20 12:15:00   NaN   NaN   NaN  NaN
2017-08-20 12:30:00  0.54  0.54  0.00  0.0
2017-08-20 12:45:00   NaN   NaN   NaN  NaN
2017-08-20 13:00:00  0.70  0.58  0.10  0.0
2017-08-20 13:15:00   NaN   NaN   NaN  NaN
2017-08-20 13:30:00   NaN   NaN   NaN  NaN
2017-08-20 13:45:00   NaN   NaN   NaN  NaN
2017-08-20 14:00:00  0.78  0.58  0.00  0.0

I've tried merge, concat, replace, interpolate, upsample and but none get me there. I need the NaN in the dataframe so I can identify where the data is not fill in later processing.

Any help would be greatly appreciated.

Upvotes: 0

Views: 33

Answers (1)

Ben.T
Ben.T

Reputation: 29635

you can use resample with asfreq directly on the data with missing time indexes such as:

print (pd.read_json(df_data).resample('15T').asfreq())
                       A     B     C    D
2017-08-20 11:15:00  0.30  0.46  0.04  0.0
2017-08-20 11:30:00  0.64  0.52  0.04  0.0
2017-08-20 11:45:00   NaN   NaN   NaN  NaN
2017-08-20 12:00:00   NaN   NaN   NaN  NaN
2017-08-20 12:15:00   NaN   NaN   NaN  NaN
2017-08-20 12:30:00  0.54  0.54  0.00  0.0
2017-08-20 12:45:00   NaN   NaN   NaN  NaN
2017-08-20 13:00:00  0.70  0.58  0.10  0.0
2017-08-20 13:15:00   NaN   NaN   NaN  NaN
2017-08-20 13:30:00   NaN   NaN   NaN  NaN
2017-08-20 13:45:00   NaN   NaN   NaN  NaN
2017-08-20 14:00:00  0.78  0.58  0.00  0.0

If you absolutely want to work with df_right, then using fillna with df_left would do the same df_right.fillna(df_left)

Upvotes: 1

Related Questions