Reputation: 173
I have a dataframe like below.
I need to do groupby(country and product) and Value column should contain count(id) where status is closed and I need to return remaining columns. Expected output format as below.
Sample input
id status ticket_time product country last_load_time metric_id name
1260057 open 2021-10-04 01:20:00 Broadband Grenada 2021-12-09 09:57:27 MTR013 repair
2998178 open 2021-10-02 00:00:00 Fixed Voice Bahamas 2021-12-09 09:57:27 MTR013 repair
3762949 closed 2021-10-01 00:00:00 Fixed Voice St Lucia 2021-12-09 09:57:27 MTR013 repair
3766608 closed 2021-10-04 00:00:00 Broadband St Lucia 2021-12-09 09:57:27 MTR013 repair
3767125 closed 2021-10-04 00:00:00 TV Antigua 2021-12-09 09:57:27 MTR013 repair
6050009 closed 2021-10-01 00:00:00 TV Jamaica 2021-12-09 09:57:27 MTR013 repair
6050608 open 2021-10-01 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6050972 open 2021-10-01 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6052253 closed 2021-10-02 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6053697 open 2021-10-03 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
**EXPECTED OUTPUT FORMAT** SAMPLE
country product load_time metric_id name ticket_time Value(count(id)with status closed)
Antigua TV 2021-12-09 09:57:27 MTR013 pending_repair 2021-10-01 1
.... ... .... ... ... ... 2
I tried the below code:
df = new_df[new_df['status'] == 'closed'].groupby(['country', 'product']).agg(Value = pd.NamedAgg(column='id', aggfunc="size"))
df.reset_index(inplace=True)
But it is returning only three columns country, product and value.
I need the remaining columns which I mentioned in the above EXPECTED OUTPUT FORMAT. Also ,I tried
df1 = new_df[new_df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')
df = df1.drop_duplicates(['country', 'product']).drop('status', axis=1)
Output
id ticket_time product country load_time metric_id name Value
3762949 2021-10-01 Fixed Voice St Lucia 2021-12-09 09:57:27 MTR013 pending_repair 23
3766608 2021-10-04 Broadband St Lucia 2021-12-09 09:57:27 MTR013 pending_repair 87
Second logic with transform returning id column which I don't want. Value column is based on count(id) where status is closed. I tried the above two methods but not able to get the expected output. Is there any way to handle this?
Upvotes: 0
Views: 80
Reputation: 103
When you group-by, typically, you're aggregating the data according to some category, so you won't keep all of the individual records, but will only be left with the columns that you've grouped-by and the column of aggregated data (a count, a mean, etc). However, the transform function will do what you want it to. I think this is what you were looking for based on your EXPECTED OUTPUT.
df_closed = df[df['status']=='closed'] # Filters data
df_closed = df_closed.reindex() # Resets index
df_closed['count_closed'] = df_closed.groupby('country')['status'].transform(len)
Upvotes: 1