FPL
FPL

Reputation: 466

Pandas GroupBy and add count of unique values as a new column

Problem

I'm trying to group a Pandas DataFrame by columns day and source, and then count unique instances of a 3rd (timestamp) for each group (to find out how many hits each source got per day), but I get this error (full traceback below): TypeError: 'method' object is not subscriptable

Attempts

My code is:

df['timestamp_count'] = df.groupby(["source", "day"]).agg({"timestamp": "nunique"})

And ideally I'd like to get a result like this (including the new column, timestamp_count:

day    source    timestamp                timestamp_count
1      facebook  2018-08-04 11:16:32.416  2
1      facebook  2019-01-03 10:25:38.216  2
1      twitter   2018-10-14 13:26:22.123  1
2      facebook  2019-01-30 12:16:32.416  1

When I run df.describe(), I see that source is an object but day is an integer. Could this be part of the problem? Having looked at other questions, it seems probable that it is along those lines, yet none of them seem to throw the same input/error combination.

Can anyone help a Python newbie? Many thanks!

Traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-90-db5a002137ca> in <module>
----> 1 df['timestamp_count'] = (df.groupby["source", 
"day"]).agg({"timestamp": "nunique"})

TypeError: 'method' object is not subscriptable

Upvotes: 4

Views: 9341

Answers (1)

cs95
cs95

Reputation: 402493

Use transform to broadcast the result:

df['timestamp_count'] = (
    df.groupby(["source", "day"])['timestamp'].transform('nunique'))
df

   day    source                timestamp  timestamp_count
0    1  facebook  2018-08-04 11:16:32.416                2
1    1  facebook  2019-01-03 10:25:38.216                2
2    1   twitter  2018-10-14 13:26:22.123                1
3    2  facebook  2019-01-30 12:16:32.416                1

Upvotes: 9

Related Questions