Reputation: 3004
I have a simple pandas dataframe that has measurements at various times:
volume
t
2013-10-13 02:45:00 17
2013-10-13 05:40:00 38
2013-10-13 09:30:00 29
2013-10-13 11:40:00 25
2013-10-13 12:50:00 11
2013-10-13 15:00:00 17
2013-10-13 17:10:00 15
2013-10-13 18:20:00 12
2013-10-13 20:30:00 20
2013-10-14 03:45:00 9
2013-10-14 06:40:00 30
2013-10-14 09:40:00 43
2013-10-14 11:05:00 10
I'm doing some basic resampling and plotting, such as the daily total volume, which works fine:
df.resample('D',how='sum').head()
volume
t
2013-10-13 184
2013-10-14 209
2013-10-15 197
2013-10-16 309
2013-10-17 317
But for some reason when I try do the total number of entries per day, it returns a a multiindex series instead of a dataframe:
df.resample('D',how='count').head()
2013-10-13 volume 9
2013-10-14 volume 9
2013-10-15 volume 7
2013-10-16 volume 9
2013-10-17 volume 10
I can fix the data so it's easily plotted with a simple unstack call, i.e. df.resample('D',how='count').unstack()
, but why does calling resample with how='count'
have a different behavior than with how='sum'
?
Upvotes: 4
Views: 5576
Reputation: 13757
It does appear the resample
and count
leads to some odd behavior in terms of how the resulting dataframe is structured (Well, at least up to 0.13.1). See here for a slightly different but related context: Count and Resampling with a mutli-ndex
You can use the same strategy here:
>>> df
volume
date
2013-10-13 02:45:00 17
2013-10-13 05:40:00 38
2013-10-13 09:30:00 29
2013-10-13 11:40:00 25
2013-10-13 12:50:00 11
2013-10-13 15:00:00 17
2013-10-13 17:10:00 15
2013-10-13 18:20:00 12
2013-10-13 20:30:00 20
2013-10-14 03:45:00 9
2013-10-14 06:40:00 30
2013-10-14 09:40:00 43
2013-10-14 11:05:00 10
So here is your issue:
>>> df.resample('D',how='count')
2013-10-13 volume 9
2013-10-14 volume 4
You can fix the issue by specifying that count
applies to the volume
column with a dict in the resample
call:
>>> df.resample('D',how={'volume':'count'})
volume
date
2013-10-13 9
2013-10-14 4
Upvotes: 7