Reputation: 1903
This is a sample of what my df looks like:
normalized_page_url impression_id ts user_ip_country
0 viosdkfjki-o1 6954BBC 2022-01-08 za
1 vd/dkjfduof-at 061E9974B233 2022-01-08 pk
2 vd-le-se-fn-pase-ri 170331464 2022-01-08 gp
3 vntaetal-mia-mre 4EC9C93E4 2022-01-08 ru
4 viater-g-kfrom-id 6B4A846D6 2022-01-08 jp
However this is what I want it to look like :
normalized_page_url imp_id_count ts user_ip_country
0 blah blah blah 2 2022-01-08 za
1 vd/dkjfduof-at 2 2022-01-08 pk
2 extra blah blah. 1 2022-01-08 gp
3 vntaetal-mia-mre 2 2022-01-08 ru
4 viater-g-kfrom-id 1 2022-01-08 jp
I've tried this but it just groups by all columns and it doesn't return impression_id count
df.groupby(['normalized_page_url', 'ts', 'user_ip_country','impression_id'])
also tried this but it doesn't look like it did anything:
df.groupby(['normalized_page_url','impression_id', 'ts', 'user_ip_country']).agg({'impression_id':'count'})
If it helps, this is how I have the query running in snowflake, this works as I'd like it to I'm just trying to get it like this in pandas:
SELECT NORMALIZED_PAGE_URL, to_date(ts) as ts_date, USER_IP_COUNTRY, count(impression_id) as imp_id_count
FROM my_table
group by 1, 2,3
Upvotes: 0
Views: 2345
Reputation: 1903
I think I got it!
df = df.groupby(['normalized_page_url', 'ts', 'user_ip_country']).agg(
imp_id_count=('impression_id','count'))
Upvotes: 1