KristiLuna
KristiLuna

Reputation: 1903

Pandas - Group by multiple columns and get count of 1 of the columns

This is a sample of what my df looks like:

   normalized_page_url   impression_id      ts           user_ip_country
0   viosdkfjki-o1          6954BBC         2022-01-08          za
1  vd/dkjfduof-at          061E9974B233    2022-01-08          pk
2  vd-le-se-fn-pase-ri     170331464       2022-01-08          gp
3    vntaetal-mia-mre      4EC9C93E4       2022-01-08          ru
4   viater-g-kfrom-id      6B4A846D6      2022-01-08           jp

However this is what I want it to look like :

   normalized_page_url   imp_id_count      ts           user_ip_country
0   blah blah blah          2            2022-01-08          za
1  vd/dkjfduof-at           2            2022-01-08          pk
2  extra blah blah.         1            2022-01-08          gp
3    vntaetal-mia-mre       2            2022-01-08          ru
4   viater-g-kfrom-id       1            2022-01-08           jp

I've tried this but it just groups by all columns and it doesn't return impression_id count

df.groupby(['normalized_page_url', 'ts', 'user_ip_country','impression_id'])

also tried this but it doesn't look like it did anything:

df.groupby(['normalized_page_url','impression_id', 'ts', 'user_ip_country']).agg({'impression_id':'count'})

If it helps, this is how I have the query running in snowflake, this works as I'd like it to I'm just trying to get it like this in pandas:

SELECT  NORMALIZED_PAGE_URL, to_date(ts) as ts_date, USER_IP_COUNTRY, count(impression_id) as imp_id_count
FROM my_table
group by 1, 2,3

Upvotes: 0

Views: 2345

Answers (1)

KristiLuna
KristiLuna

Reputation: 1903

I think I got it!

df = df.groupby(['normalized_page_url', 'ts', 'user_ip_country']).agg(
    imp_id_count=('impression_id','count'))

Upvotes: 1

Related Questions