Reputation: 91
I have empty dataframe with columns: [order_id, uid, payment_channel, user_paid_amount, vertical]
when I use df.groupby(['uid','vertical']).payment_channel.agg('count').reset_index()
its returns empty dataframe with Columns: [uid, vertical, total_transaction]
But when I use df.groupby(['uid','vertical']).user_paid_amount.agg('sum').reset_index()
its returns empty dataframe with Columns: [index, gmv]
How to use aggregate summation function but still maintain the uid
and vertical
column
EDIT sample dataframe
IN [] : empty_df = pd.DataFrame(columns=['uid','vertical','topup_payable_amount'])
empty_df.dtypes
OUT[] : uid object
vertical object
topup_payable_amount object
dtype: object
Upvotes: 0
Views: 451
Reputation: 30971
On an empty Dataframe, created the way you did, results of both your instructions are:
Empty DataFrame
Columns: [uid, vertical, payment_channel]
Index: []
and
Empty DataFrame
Columns: [index, user_paid_amount]
Index: []
Note that as far as the aggregated column is concened, I got the original column name.
You can "rename" this column, passing name parameter to reset_index, e.g.
df.groupby(['uid','vertical']).user_paid_amount.agg('sum').reset_index(name='xyz')
(or whatever other name).
I use Pandas version 0.25.3 and Python version 3.8.0. If you have some older version, upgrade and repeat the test.
And now let's get down to the names of grouping columns in the result.
Note that if you created an empty DataFrame, then Pandas has no information on column types. Normally (if some data rows had been provided), the type of each column would have been inferred from the source data, but not in your case.
This is why the type of all columns (including user_paid_amount) is set as object.
The consequence is that you can not sum on such a column (you can only on numeric columns). Apparently, instead of rising an exception, the Pandas code takes some "exceptional" path of execution, giving the above weird result (grouping column named as index).
After you defined the DataFrame, change the column type, at least for user_paid_amount:
empty_df.user_paid_amount = empty_df.user_paid_amount.astype(float)
Then execution of:
print(empty_df.groupby(['uid','vertical']).user_paid_amount.agg('sum').reset_index())
gives the "normal" result:
Empty DataFrame
Columns: [uid, vertical, user_paid_amount]
Index: []
And the final remark: Don't use such name as empty_df. This Dataframe is empty for the time being, just after creation, but at some point later it will contain some data (and will not be empty).
Upvotes: 1