Reputation: 7703
I have a dataframe like as shown below
op1 = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2],
'date' : ['1/1/2017','1/1/2017','1/1/2017','1/2/2017','1/2/2017','1/2/2017','1/3/2017','1/3/2017','1/3/2017','1/4/2017','1/4/2017','1/4/2017','1/5/2017','1/5/2017','1/5/2017',
'1/6/2017','1/6/2017','1/6/2017'],
'val' :[5,5,11,10,5,7,16,12,11,21,23,26,6,8,5,11,10,3]
})
What I would like to do is get the min
and max
for each subject every day.
Though my code works below, I feel this can be written in a much better way
t1 = op1.groupby(['subject_id','date'])['val'].max().reset_index()
t2 = op1.groupby(['subject_id','date'])['val'].min().reset_index()
t1.merge(t2,on=['subject_id','date'],how='inner',suffixes=('_max', '_min'))
The output should look like given below. Though my code works, it isn't elegant I feel. Is there any other way to write the max
and min
in one line?
Upvotes: 1
Views: 86
Reputation: 862601
Use GroupBy.agg
with tuples for new columns names and aggregate functions:
df = (op1.groupby(['subject_id','date'])['val']
.agg([('val_max', 'max'),('val_min', 'min')])
.reset_index())
print (df)
subject_id date val_max val_min
0 1 1/1/2017 11 5
1 1 1/2/2017 10 5
2 1 1/3/2017 16 11
3 2 1/4/2017 26 21
4 2 1/5/2017 8 5
5 2 1/6/2017 11 3
In pandas 0.25+ is possible use named aggregation
:
df = (op1.groupby(['subject_id','date'])
.agg(val_min=pd.NamedAgg(column='val', aggfunc='min'),
val_max=pd.NamedAgg(column='val', aggfunc='max'))
.reset_index())
Upvotes: 7