sweetdream
sweetdream

Reputation: 1419

Bug in resampling with pandas 0.8?

I am currently fighting to work with the reasmpling function from pandas 0.8.0b1.

For example, when I try to aggregate (using 'mean') 10min values to monthly values, the function seems to use the last day of data from one month in the mean of the next month...

Here is an example with a simple time serie of 3 month of 10 minutes data with

The monthly means I get using df.resample('M',how='mean') are :

Out[454]: 

0
2012-01-31  1.000000
2012-02-29  1.965757
2012-03-31  2.967966
2012-04-30  3.000000

but I would like to get something like:

0
2012-02-01  1.000000
2012-03-01  2.000000
2012-04-01  3.000000

Here is the code:

january = pd.date_range(pd.datetime(2012,1,1),pd.datetime(2012,1,31,23,50),freq='10min')
february = pd.date_range(pd.datetime(2012,2,1),pd.datetime(2012,2,29,23,50),freq='10min')
march = pd.date_range(pd.datetime(2012,3,1),pd.datetime(2012,3,31,23,50),freq='10min')
data_jan = np.zeros(size(january))+1
data_feb = np.zeros(size(february))+2
data_march = np.zeros(size(march))+3
df1 = pd.DataFrame(data_jan,index=january)
df2 = pd.DataFrame(data_feb,index=february)
df3 = pd.DataFrame(data_march,index=march)
df = pd.concat([df1,df2,df3])
df.resample('M',how='mean')

If now, I remove the last day by :

january = pd.date_range(pd.datetime(2012,1,1),pd.datetime(2012,1,31,00,00),freq='10min')
february = pd.date_range(pd.datetime(2012,2,1),pd.datetime(2012,2,29,00,00),freq='10min')
march = pd.date_range(pd.datetime(2012,3,1),pd.datetime(2012,3,31,00,00),freq='10min')

I get (nearly) what I want:

Out[474]: 
            0
2012-01-31  1
2012-02-29  2
2012-03-31  3

Could you help me ???? Is it a bug ???

Upvotes: 2

Views: 640

Answers (1)

Wes McKinney
Wes McKinney

Reputation: 105551

This is indeed a bug, I have two issues for it:

https://github.com/pydata/pandas/issues/1458

https://github.com/pydata/pandas/issues/1471

This should be fixed before pandas 0.8.0 is released. Note that this works correctly:

In [15]: df.resample('M', kind='period')
Out[15]: 
          0
Jan-2012  1
Feb-2012  2
Mar-2012  3

EDIT: Just fixed this in git master (both of the above reference issues have been closed)

Upvotes: 3

Related Questions