user1642513
user1642513

Reputation:

Python pandas resample added dates not present in the original data

I am using pandas to convert intraday data, stored in data_m, to daily data. For some reason resample added rows for days that were not present in the intraday data. For example, 1/8/2000 is not in the intraday data, yet the daily data contains a row for that date with NaN as the value. DatetimeIndex has more entries than the actual data. Am I doing anything wrong?

data_m.resample('D', how = mean).head()
Out[13]: 
           x
2000-01-04 8803.879581
2000-01-05 8765.036649
2000-01-06 8893.156250
2000-01-07 8780.037433
2000-01-08 NaN

data_m.resample('D', how = mean)
Out[14]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4729 entries, 2000-01-04 00:00:00 to 2012-12-14 00:00:00
Freq: D
Data columns:
x    3241  non-null values
dtypes: float64(1)

Upvotes: 6

Views: 4543

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375565

What you are doing looks correct, it's just that pandas gives NaN for the mean of an empty array.

In [1]: Series().mean()
Out[1]: nan

resample converts to a regular time interval, so if there are no samples that day you get NaN.

Most of the time having NaN isn't a problem. If it is we can either use fill_method (for example 'ffill') or if you really wanted to remove them you could use dropna (not recommended):

data_m.resample('D', how = mean, fill_method='ffill')
data_m.resample('D', how = mean).dropna()

Update: The modern equivalent seems to be:

In [21]: s.resample("D").mean().ffill()
Out[21]:
                      x
2000-01-04  8803.879581
2000-01-05  8765.036649
2000-01-06  8893.156250
2000-01-07  8780.037433
2000-01-08  8780.037433

In [22]: s.resample("D").mean().dropna()
Out[22]:
                      x
2000-01-04  8803.879581
2000-01-05  8765.036649
2000-01-06  8893.156250
2000-01-07  8780.037433

See resample docs.

Upvotes: 7

Garrett
Garrett

Reputation: 49836

Prior to 0.10.0, pandas labeled resample bins with the right-most edge, which for daily resampling, is the next day. Starting with 0.10.0, the default binning behavior for daily and higher frequencies changed to label='left', closed='left' to minimize this confusion. See http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#api-changes for more information.

Upvotes: 1

Related Questions