rogersPass
rogersPass

Reputation: 21

Pandas reindex turning all non-index columns to NaN

I have a csv file of weather station data which has non continuous timestamps:

logstamp               temp     rh     snow      wind          gust    wind_dir                         
2018-01-26 21:00:00   -10.120  63.93  207.1     4.018         9.806   173.900   
2018-01-26 22:00:00    -9.750  58.54  207.0     3.856        11.149   158.500   
2018-01-26 23:00:00    -9.710  60.92  206.9     6.505        13.759   159.100   
2018-01-27 00:00:00   -10.110  57.45  206.7     6.602        12.488   167.700   
2018-01-28 13:00:00    -7.574  84.90  212.4     5.594        15.736   134.100   
2018-01-28 14:00:00    -4.347  88.20  213.1     5.663        15.242   170.700   
2018-01-28 15:00:00    -1.360  89.30  213.0     4.896        19.051   175.300   

I would like to use the pandas reindex function to add rows where there are missing timestamps so I can interpolate data for the missing times. For example our weather station cuts out Jan 27 in the above table.

I tried using the reindex function with pandas. This results in new interpolated rows for the missing times, however it turns all of the original column data to NaN.

ts1 = pd.read_csv("mtmaya-2018-02-20.csv", index_col='logstamp', infer_datetime_format='TRUE') 

index = pd.date_range(ts1.index.min(),ts1.index.max(), freq="H")
ts1 = ts1.reindex(index)

                     air temp  rh  snow  wind spd  wind spd max  wind dir  \
2018-01-25 14:00:00       NaN NaN   NaN       NaN           NaN       NaN   
2018-01-25 15:00:00       NaN NaN   NaN       NaN           NaN       NaN   
2018-01-25 16:00:00       NaN NaN   NaN       NaN           NaN       NaN   

I assume I am missing something.

Upvotes: 0

Views: 259

Answers (1)

sacuL
sacuL

Reputation: 51345

I think you need to call interpolate. For your example dataframe:

# This is the same as what you had
index = pd.date_range(ts1.index.min(),ts1.index.max(), freq="H")

# Here, reindex as you had done before, but chain an 'interpolate' on top of that
ts1 = ts1.reindex(index).interpolate(method='time')

If you print the first 10 columns, you'll see that the NaNs for 2018-01-27 have been interpolated:

>>> ts1.head(10)
                          temp         rh        snow      wind       gust  \
2018-01-26 21:00:00 -10.120000  63.930000  207.100000  4.018000   9.806000   
2018-01-26 22:00:00  -9.750000  58.540000  207.000000  3.856000  11.149000   
2018-01-26 23:00:00  -9.710000  60.920000  206.900000  6.505000  13.759000   
2018-01-27 00:00:00 -10.110000  57.450000  206.700000  6.602000  12.488000   
2018-01-27 01:00:00 -10.041459  58.191892  206.854054  6.574757  12.575784   
2018-01-27 02:00:00  -9.972919  58.933784  207.008108  6.547514  12.663568   
2018-01-27 03:00:00  -9.904378  59.675676  207.162162  6.520270  12.751351   
2018-01-27 04:00:00  -9.835838  60.417568  207.316216  6.493027  12.839135   
2018-01-27 05:00:00  -9.767297  61.159459  207.470270  6.465784  12.926919   
2018-01-27 06:00:00  -9.698757  61.901351  207.624324  6.438541  13.014703   

                       wind_dir  
2018-01-26 21:00:00  173.900000  
2018-01-26 22:00:00  158.500000  
2018-01-26 23:00:00  159.100000  
2018-01-27 00:00:00  167.700000  
2018-01-27 01:00:00  166.791892  
2018-01-27 02:00:00  165.883784  
2018-01-27 03:00:00  164.975676  
2018-01-27 04:00:00  164.067568  
2018-01-27 05:00:00  163.159459  
2018-01-27 06:00:00  162.251351 

Upvotes: 1

Related Questions