Reputation: 12406
I am trying to add a column to a Pandas GroupBy DataFrame with a multi-index. The column is the difference between the max and mean value of a common key after grouping.
Here is the input DataFrame:
Main Reads Test Subgroup
0 1 5 54 1
1 2 2 55 1
2 1 10 56 2
3 2 20 57 3
4 1 7 58 3
Here is the code:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Main': [1, 2, 1, 2, 1], 'Reads': [5, 2, 10, 20, 7],\
'Test':range(54,59), 'Subgroup':[1,1,2,3,3]})
result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean]})
Here is the variable result
before perform the calculation of diff
:
Reads
amax mean
Main Subgroup
1 1 5 5
2 10 10
3 7 7
2 1 2 2
3 20 20
Next, I calculate the diff
column with:
result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']
but here is the output:
/home/userd/test.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/
...stable/indexing.html#indexing-view-versus-copy
...result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']
I would like the diff
column to be at the same level a amax
and mean
.
Is there a way to add a column to the innermost (bottom) column index of a multi-index GroupBy()
object in Pandas?
Upvotes: 1
Views: 1530
Reputation: 38415
You can access the multi-index using a tuple
result[('Reads','diff')] = result[('Reads','amax')] - result[('Reads','mean')]
You get
Reads
amax mean diff
Main Subgroup
1 1 5 5 0
2 10 10 0
3 7 7 0
2 1 2 2 0
3 20 20 0
Upvotes: 3
Reputation: 19957
#you can you lambda to build diff directly.
df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean,lambda x: np.max(x)-np.mean(x)]}).rename(columns={'<lambda>':'diff'})
Out[2360]:
Reads
amax mean diff
Main Subgroup
1 1 5 5 0
2 10 10 0
3 7 7 0
2 1 2 2 0
3 20 20 0
Upvotes: 2
Reputation: 210902
Try this:
In [8]: result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean, lambda x: x.max()-x.mean()]})
In [9]: result
Out[9]:
Reads
amax mean <lambda>
Main Subgroup
1 1 5 5 0
2 10 10 0
3 7 7 0
2 1 2 2 0
3 20 20 0
In [10]: result = result.rename(columns={'<lambda>':'diff'})
In [11]: result
Out[11]:
Reads
amax mean diff
Main Subgroup
1 1 5 5 0
2 10 10 0
3 7 7 0
2 1 2 2 0
3 20 20 0
Upvotes: 2