Reputation: 6749
I have a spreadsheet which has data in the following format:
Brand | Model | Year | Cost | Tax
--------------------------------------
Apple | iPhone 7 | 2017 | $1000 | $100
Apple | iphone 7 | 2018 | $800 | $80
Xiomi | Note 5 | 2017 | $300 | $30
Xiomi | Note 5 | 2018 | $200 | $20
I want to transform the above data set to the following where I want to show Mean
of Cost column when rows are grouped by ['Brand', 'Model']
and a Result column which is the sum of Mean
and Tax
column values:
Brand | Model | Year | Cost | Mean | Tax | Result
------------------------------------------------------------
Apple | iPhone 7 | 2017 | $1000 | $900 | $100 | $1000
Apple | iphone 7 | 2018 | $800 | $900 | $80 | $980
Xiomi | Note 5 | 2017 | $300 | $250 | $30 | $280
Xiomi | Note 5 | 2018 | $200 | $250 | $25 | $275
I have been trying with groupby function but not getting a way to get the desired result as above.
Looking forward to your responses. Thank you.
Upvotes: 1
Views: 42
Reputation: 862601
First convert values to integers with replace
, get mean
by transform
, then sum
and last convert back to strings if necessary:
cols = ['Cost','Tax']
df[cols] = df[cols].replace('\$','', regex=True).astype(int)
df['Mean'] = df.groupby(['Brand', 'Model'])['Cost'].transform('mean')
df['Result'] = df[['Mean','Tax']].sum(axis=1)
print (df)
Brand Model Year Cost Tax Mean Result
0 Apple iPhone 7 2017 1000 100 1000 1100
1 Apple iphone 7 2018 800 80 800 880
2 Xiomi Note 5 2017 300 30 250 280
3 Xiomi Note 5 2018 200 20 250 270
And then:
cols1 = cols + ['Result', 'Mean']
df[cols1] = '$' + df[cols1].astype(str)
print (df)
Brand Model Year Cost Tax Mean Result
0 Apple iPhone 7 2017 $1000 $100 $1000 $1100
1 Apple iphone 7 2018 $800 $80 $800 $880
2 Xiomi Note 5 2017 $300 $30 $250 $280
3 Xiomi Note 5 2018 $200 $20 $250 $270
Upvotes: 1