MPA
MPA

Reputation: 1117

Group by and values_counts

My df looks like that

session_id page_type
10001_0    a
10001_0    b
10001_0    b
10001_0    b
10001_0    c
10001_0    c
10002_0    a
10002_0    a
10002_0    b
10002_0    b
10002_0    c
10002_0    c

I want to groupby by 'session_id' and counts the values('a','b','c') As:

session_id count_page_type
10001_0 {a:1,b:3,c:2}
10002_0 {a:2,b:2,c:2}

I don't care about the type in 'count_page_type' column it can be list as well. The aggregation is on multiple columns,

agg_dict = ({'uid':'first',
             'request_id':'unique',
             'sso_id':'first',
             'article_id' :['first','last','nunique'],
             'event_time':['min','max'],
             'session_duration':'sum',
             'anonymous_id':['first','nunique'],
             'platform':['first','nunique'],
             'brand':['first','last','nunique'],
             'user_type':['first','last'],
             'page_type':'value_counts'})
df.groupby('session_id').agg(agg_dict)

Now i am getting error

ValueError: cannot insert page_type, already exists

any suggestions? Thanks

Upvotes: 3

Views: 113

Answers (1)

Ayoub ZAROU
Ayoub ZAROU

Reputation: 2417

value_counts returns rather a pd.Series rather than just a row, try doing something like :

df.groupby('session_id').agg({'page_type': lambda x : x.value_counts().to_dict()})

Upvotes: 3

Related Questions