Reputation: 115
I'm just getting started with Pandas and am trying to combine: Grouping my data by date, and counting the unique values in each group.
Here's what my data looks like:
User, Type
Datetime
2014-04-15 11:00:00, A, New
2014-04-15 12:00:00, B, Returning
2014-04-15 13:00:00, C, New
2014-04-20 14:00:00, D, New
2014-04-20 15:00:00, B, Returning
2014-04-20 16:00:00, B, Returning
2014-04-20 17:00:00, D, Returning
And here's what I would like to get to: Resample the datetime index to the day (which I can do), and also count the unique users for each day. I'm not interested in the 'Type' column yet.
Day, Unique Users
2014-04-15, 3
2014-04-20, 2
I'm trying df.user.resample('D', how='count').unique
but it doesn't seem to give me the right answer.
Upvotes: 7
Views: 9015
Reputation: 6526
I came across the same problem. Resample worked for me with nunique. The nice way with resample is that it makes it very simple to change the sample rate for example to hour or minutes and that the timestamp is kept as index.
df.user.resample('D').nunique()
Upvotes: 2
Reputation: 12029
I was running into the same problem. Karl D's answer works for some kind of reindexing -- on date, for example. but what if you want the index to be
Jan 2014
Feb 2014
March 2014
and then plot it as a timeseries?
Here's what I did:
df.user.resample('M',lambda x: x.nunique())
Upvotes: 0
Reputation: 13757
You don't need to do a resample to get the desired output in your question. I think you can get by with just a groupby
on date:
print df.groupby(df.index.date)['User'].nunique()
2014-04-15 3
2014-04-20 2
dtype: int64
And then if you want to you could resample to fill in the time series gaps after you count the unique users:
cnt = df.groupby(df.index.date)['User'].nunique()
cnt.index = cnt.index.to_datetime()
print cnt.resample('D')
2014-04-15 3
2014-04-16 NaN
2014-04-17 NaN
2014-04-18 NaN
2014-04-19 NaN
2014-04-20 2
Freq: D, dtype: float64
Upvotes: 8