Alexander Hempfing
Alexander Hempfing

Reputation: 247

Subtract argument-specific column mean in pandas DataFrame

I would like to calculate and subtract an argument specific (here industry-specific) mean by columns using a Panda Dataframe.

My data look as follows:

NAT SEC     2006    2007    2008
AUS D01T03  6750.0  7138.0  ... 
AUS D09     4926.0  6092.0  ... 
AUT D01T03  4926.0  5969.0  ... 
AUT D09     3381.0  3310.0  ... 
BEL D01T03  11733.0 12883.0 ... 
BEL D09     1938.0  1937.0  ...

Now I would like to calculate separately, or directly (as possible), the mean of both arguments in column SEC which looks as follows:

TYP     SEC     2006    2007    ...
Mean    D01T03  7803.0  8663.3  ...
Mean    D09     3415.0  4049.6  ...

In the final step, I would like to subtract this (in this case industry-specific) mean from the "original" DataFrame. Thus it could look like this:

NAT SEC     2006    2007    2008
AUS D01T03  -1053.0 ... ...
AUS D09     1511.0  ... ...
AUT D01T03  -2877.0 ... ...
AUT D09     -34.0   ... ...
BEL D01T03  3930.0  ... ...
BEL D09     -1477.0 ... ...

Unfortunately, I did not find any suitable threads until now and would be very happy if someone of you could please help me here or forward the suitable thread! Thank you in advance!

Best regards Alex

Upvotes: 0

Views: 139

Answers (1)

sammywemmy
sammywemmy

Reputation: 28729

A combination of groupby and subtraction should suffice:

#set SEC as index
df = df.set_index('SEC').sort_index()

#aggregate the mean by the index
aggregation = df.groupby(level=0).mean()

#get the numeric only columns
num_cols = df.filter(regex='\d').columns

#assign the difference between the numeric columns and the aggregates back to the dataframe
df.loc[:,num_cols] = df.filter(regex='\d').sub(aggregation)

#sort by NAT 
df = df.sort_values('NAT')

df

        NAT 2006    2007
SEC         
D01T03  AUS -1053.0 -1525.333333
D09     AUS 1511.0  2312.333333
D01T03  AUT -2877.0 -2694.333333
D09     AUT -34.0   -469.666667
D01T03  BEL 3930.0  4219.666667
D09     BEL -1477.0 -1842.666667

Upvotes: 1

Related Questions