edesz
edesz

Reputation: 12406

Python Pandas add column to multi-index GroupBy DataFrame

I am trying to add a column to a Pandas GroupBy DataFrame with a multi-index. The column is the difference between the max and mean value of a common key after grouping.

Here is the input DataFrame:

   Main  Reads  Test  Subgroup
0     1      5    54         1
1     2      2    55         1
2     1     10    56         2
3     2     20    57         3
4     1      7    58         3

Here is the code:

import numpy as np
import pandas as pd

df = pd.DataFrame({'Main': [1, 2, 1, 2, 1], 'Reads': [5, 2, 10, 20, 7],\
                   'Test':range(54,59), 'Subgroup':[1,1,2,3,3]})

result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean]})

Here is the variable result before perform the calculation of diff:

              Reads     
               amax mean
Main Subgroup           
1    1            5    5
     2           10   10
     3            7    7
2    1            2    2
     3           20   20

Next, I calculate the diff column with:

result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']

but here is the output:

/home/userd/test.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/
...stable/indexing.html#indexing-view-versus-copy
...result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']

I would like the diff column to be at the same level a amax and mean.

Is there a way to add a column to the innermost (bottom) column index of a multi-index GroupBy() object in Pandas?

Upvotes: 1

Views: 1530

Answers (3)

Vaishali
Vaishali

Reputation: 38415

You can access the multi-index using a tuple

result[('Reads','diff')] = result[('Reads','amax')] - result[('Reads','mean')]

You get

                    Reads
                    amax    mean    diff
Main    Subgroup            
1       1           5       5       0
        2          10      10       0
        3           7       7       0
2       1           2       2       0
        3          20      20       0

Upvotes: 3

Allen Qin
Allen Qin

Reputation: 19957

#you can you lambda to build diff directly.
df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean,lambda x: np.max(x)-np.mean(x)]}).rename(columns={'<lambda>':'diff'})
Out[2360]: 
              Reads          
               amax mean diff
Main Subgroup                
1    1            5    5    0
     2           10   10    0
     3            7    7    0
2    1            2    2    0
     3           20   20    0

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210902

Try this:

In [8]: result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean, lambda x: x.max()-x.mean()]})

In [9]: result
Out[9]:
              Reads
               amax mean <lambda>
Main Subgroup
1    1            5    5        0
     2           10   10        0
     3            7    7        0
2    1            2    2        0
     3           20   20        0

In [10]: result = result.rename(columns={'<lambda>':'diff'})

In [11]: result
Out[11]:
              Reads
               amax mean diff
Main Subgroup
1    1            5    5    0
     2           10   10    0
     3            7    7    0
2    1            2    2    0
     3           20   20    0

Upvotes: 2

Related Questions