Reputation: 49
I have a dataset that looks like this.
ID BRAND_NAME PRICE COLOR ACTUAL PREDICTED MAPE ACCURACY
1 mjjk 657 Black 0 1 0 1
1 mjjk 657 Black 1 1 0 1
I want by dataset to be grouped by ID and have it like this:
ID BRAND_NAME PRICE COLOR ACTUAL PREDICTED MAPE ACCURACY
1 mjjk 657 Black 1 2 1 0
where Actual and Predicted columns has the sum of actual values of the particular ID and mape and accuracy are calculated using the sum of actual and predicted values like MAPE=(Predicted-Actual)/Actual
and Accuracy is 1-MAPE
am now stuck with how to group by id and get the corresponding values
Upvotes: 0
Views: 71
Reputation:
Use groupby.agg
. Group by ID
and take the first of 'BRAND_NAME'
, 'PRICE'
, and 'COLOR'
and sum of 'ACTUAL'
and 'PREDICTED'
. Then do computations on the output dataframe to obtain 'MAPE'
and 'ACCURACY'
columns:
out = df.groupby('ID').agg({'BRAND_NAME':'first','PRICE':'first','COLOR':'first', 'ACTUAL':'sum', 'PREDICTED':'sum'}).reset_index()
out['MAPE'] = (out['PREDICTED'] - out['ACTUAL']) / out['ACTUAL']
out['ACCURACY'] = 1 - out['MAPE']
Output:
ID BRAND_NAME PRICE COLOR ACTUAL PREDICTED MAPE ACCURACY
0 1 mjjk 657 Black 1 2 1.0 0.0
Upvotes: 1
Reputation: 34046
Use Groupby.agg
:
In [121]: res = df.groupby(['ID', 'BRAND_NAME', 'COLOR'], as_index=False).agg({'PRICE':max, 'ACTUAL': sum, 'PREDICTED': sum})
In [123]: res['MAPE'] = (res.PREDICTED - res.ACTUAL).div(res.ACTUAL)
In [125]: res['ACCURACY'] = 1 - res.MAPE
In [126]: res
Out[126]:
ID BRAND_NAME COLOR PRICE ACTUAL PREDICTED MAPE ACCURACY
0 1 mjjk Black 657 1 2 1.0 0.0
Upvotes: 1