Malcolm Bastien
Malcolm Bastien

Reputation: 115

Pandas: Count Unique Values after Resample

I'm just getting started with Pandas and am trying to combine: Grouping my data by date, and counting the unique values in each group.

Here's what my data looks like:

                  User, Type
Datetime
2014-04-15 11:00:00, A, New
2014-04-15 12:00:00, B, Returning
2014-04-15 13:00:00, C, New
2014-04-20 14:00:00, D, New
2014-04-20 15:00:00, B, Returning
2014-04-20 16:00:00, B, Returning
2014-04-20 17:00:00, D, Returning

And here's what I would like to get to: Resample the datetime index to the day (which I can do), and also count the unique users for each day. I'm not interested in the 'Type' column yet.

Day, Unique Users
2014-04-15, 3
2014-04-20, 2

I'm trying df.user.resample('D', how='count').unique but it doesn't seem to give me the right answer.

Upvotes: 7

Views: 9015

Answers (3)

mjspier
mjspier

Reputation: 6526

I came across the same problem. Resample worked for me with nunique. The nice way with resample is that it makes it very simple to change the sample rate for example to hour or minutes and that the timestamp is kept as index.

df.user.resample('D').nunique()

Upvotes: 2

exp1orer
exp1orer

Reputation: 12029

I was running into the same problem. Karl D's answer works for some kind of reindexing -- on date, for example. but what if you want the index to be

Jan 2014
Feb 2014
March 2014

and then plot it as a timeseries?

Here's what I did:

df.user.resample('M',lambda x: x.nunique())

Upvotes: 0

Karl D.
Karl D.

Reputation: 13757

You don't need to do a resample to get the desired output in your question. I think you can get by with just a groupby on date:

print df.groupby(df.index.date)['User'].nunique()

2014-04-15    3
2014-04-20    2
dtype: int64

And then if you want to you could resample to fill in the time series gaps after you count the unique users:

cnt = df.groupby(df.index.date)['User'].nunique()
cnt.index = cnt.index.to_datetime()
print cnt.resample('D')

2014-04-15     3
2014-04-16   NaN
2014-04-17   NaN
2014-04-18   NaN
2014-04-19   NaN
2014-04-20     2
Freq: D, dtype: float64

Upvotes: 8

Related Questions