moustachio
moustachio

Reputation: 3004

Resampling with 'how=count' causing problems

I have a simple pandas dataframe that has measurements at various times:

                     volume
t
2013-10-13 02:45:00      17
2013-10-13 05:40:00      38
2013-10-13 09:30:00      29
2013-10-13 11:40:00      25
2013-10-13 12:50:00      11
2013-10-13 15:00:00      17
2013-10-13 17:10:00      15
2013-10-13 18:20:00      12
2013-10-13 20:30:00      20
2013-10-14 03:45:00       9
2013-10-14 06:40:00      30
2013-10-14 09:40:00      43
2013-10-14 11:05:00      10

I'm doing some basic resampling and plotting, such as the daily total volume, which works fine:

df.resample('D',how='sum').head()   

            volume
t
2013-10-13     184
2013-10-14     209
2013-10-15     197
2013-10-16     309
2013-10-17     317

But for some reason when I try do the total number of entries per day, it returns a a multiindex series instead of a dataframe:

df.resample('D',how='count').head()

2013-10-13  volume     9
2013-10-14  volume     9
2013-10-15  volume     7
2013-10-16  volume     9
2013-10-17  volume    10

I can fix the data so it's easily plotted with a simple unstack call, i.e. df.resample('D',how='count').unstack(), but why does calling resample with how='count' have a different behavior than with how='sum'?

Upvotes: 4

Views: 5576

Answers (1)

Karl D.
Karl D.

Reputation: 13757

It does appear the resample and count leads to some odd behavior in terms of how the resulting dataframe is structured (Well, at least up to 0.13.1). See here for a slightly different but related context: Count and Resampling with a mutli-ndex

You can use the same strategy here:

>>> df
                     volume
date                       
2013-10-13 02:45:00      17
2013-10-13 05:40:00      38
2013-10-13 09:30:00      29
2013-10-13 11:40:00      25
2013-10-13 12:50:00      11
2013-10-13 15:00:00      17
2013-10-13 17:10:00      15
2013-10-13 18:20:00      12
2013-10-13 20:30:00      20
2013-10-14 03:45:00       9
2013-10-14 06:40:00      30
2013-10-14 09:40:00      43
2013-10-14 11:05:00      10

So here is your issue:

>>> df.resample('D',how='count')

2013-10-13  volume    9
2013-10-14  volume    4

You can fix the issue by specifying that count applies to the volume column with a dict in the resample call:

>>> df.resample('D',how={'volume':'count'})

            volume
date              
2013-10-13       9
2013-10-14       4

Upvotes: 7

Related Questions