Reputation: 1482
Some excercise is asking to "add a secondary name" to a df column when applying an aggregation function.
Given following DF:
stype = ['s1','s1','s1','s1',
's2','s2','s2','s2']
dtype = ['d1','d1','d2','d2',
'd1','d1','d2','d2']
qty = [2, 1.4, 7, 3,
4, 1, 2.0, 3]
df = pd.DataFrame({'s_type':stype,
'd_type':dtype,
'qty':qty})
When grouping by the first 2 columns and applying an agg function like this
new_df=df.groupby(['s_type','d_type'],sort = False).agg({'qty':'median'})
And do print(new_df)
I get this output, notice the qty
is moved up a line (I assume this is normal behavior) because is the one column where agg function was applied
qty
s_type d_type
s1 d1 1.7
d2 5.0
s2 d1 2.5
d2 2.5
Question(s ) would it be
Is it possible to setc secondary column name under qty
label/name ?, to get an output like this:
qty
s_type d_type Median
s1 d1 1.7
d2 5.0
s2 d1 2.5
d2 2.5
Also
Why qty
moves up 1 line ? (I'm using Spyder and I've seen this in the Anaconda/python console as well, not sure if its like a feature or pandas normal behavior).
One Last update: in case you need to use sort_values you will have to use the tuple for the column you add a secondary index/name, pretty much in the same way when applying the aggregate function (see the answer)
new_df=new_df.sort_values([('qty','MEDIAN')])
So you get the DF sorted by the column where agg function was applied and where we add an index
qty
MEDIAN
s_type d_type
s1 d1 1.7
s2 d1 2.5
d2 2.5
s1 d2 5.0
python v3.6
pandas v0.23
Thank you.
Upvotes: 2
Views: 63
Reputation: 402253
Super simple. Pass "median" as a list inside agg
:
result = df.groupby(['s_type','d_type'], sort=False).agg({'qty': ['median']})
result
qty
median
s_type d_type
s1 d1 1.7
d2 5.0
s2 d1 2.5
d2 2.5
As for "why qty moves up 1 line", that means the first two columns are actually the index in the output. Querying result.index
will make it obvious:
result.index
MultiIndex(levels=[['s1', 's2'], ['d1', 'd2']],
codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['s_type', 'd_type'])
You can instead specify as_index=False
when calling groupby,
df.groupby(['s_type','d_type'], sort=False, as_index=False).agg({'qty': ['median']})
s_type d_type qty
median
0 s1 d1 1.7
1 s1 d2 5.0
2 s2 d1 2.5
3 s2 d2 2.5
So the groupers remain columns in the output.
You can change the name in the output by passing a list of tuples to agg
:
df.groupby(['s_type','d_type'], sort=False).agg(
{'qty': [('MEDIAN', 'median')]}
)
qty
MEDIAN
s_type d_type
s1 d1 1.7
d2 5.0
s2 d1 2.5
d2 2.5
Upvotes: 2