Reputation: 247
I would like to calculate and subtract an argument specific (here industry-specific) mean by columns using a Panda Dataframe.
My data look as follows:
NAT SEC 2006 2007 2008
AUS D01T03 6750.0 7138.0 ...
AUS D09 4926.0 6092.0 ...
AUT D01T03 4926.0 5969.0 ...
AUT D09 3381.0 3310.0 ...
BEL D01T03 11733.0 12883.0 ...
BEL D09 1938.0 1937.0 ...
Now I would like to calculate separately, or directly (as possible), the mean of both arguments in column SEC
which looks as follows:
TYP SEC 2006 2007 ...
Mean D01T03 7803.0 8663.3 ...
Mean D09 3415.0 4049.6 ...
In the final step, I would like to subtract this (in this case industry-specific) mean from the "original" DataFrame. Thus it could look like this:
NAT SEC 2006 2007 2008
AUS D01T03 -1053.0 ... ...
AUS D09 1511.0 ... ...
AUT D01T03 -2877.0 ... ...
AUT D09 -34.0 ... ...
BEL D01T03 3930.0 ... ...
BEL D09 -1477.0 ... ...
Unfortunately, I did not find any suitable threads until now and would be very happy if someone of you could please help me here or forward the suitable thread! Thank you in advance!
Best regards Alex
Upvotes: 0
Views: 139
Reputation: 28729
A combination of groupby and subtraction should suffice:
#set SEC as index
df = df.set_index('SEC').sort_index()
#aggregate the mean by the index
aggregation = df.groupby(level=0).mean()
#get the numeric only columns
num_cols = df.filter(regex='\d').columns
#assign the difference between the numeric columns and the aggregates back to the dataframe
df.loc[:,num_cols] = df.filter(regex='\d').sub(aggregation)
#sort by NAT
df = df.sort_values('NAT')
df
NAT 2006 2007
SEC
D01T03 AUS -1053.0 -1525.333333
D09 AUS 1511.0 2312.333333
D01T03 AUT -2877.0 -2694.333333
D09 AUT -34.0 -469.666667
D01T03 BEL 3930.0 4219.666667
D09 BEL -1477.0 -1842.666667
Upvotes: 1