The Great
The Great

Reputation: 7703

Elegant way to get min and max using pandas

I have a dataframe like as shown below

op1 = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2],
'date' : ['1/1/2017','1/1/2017','1/1/2017','1/2/2017','1/2/2017','1/2/2017','1/3/2017','1/3/2017','1/3/2017','1/4/2017','1/4/2017','1/4/2017','1/5/2017','1/5/2017','1/5/2017',
         '1/6/2017','1/6/2017','1/6/2017'],
'val' :[5,5,11,10,5,7,16,12,11,21,23,26,6,8,5,11,10,3]
})

What I would like to do is get the min and max for each subject every day.

Though my code works below, I feel this can be written in a much better way

t1 = op1.groupby(['subject_id','date'])['val'].max().reset_index()
t2 = op1.groupby(['subject_id','date'])['val'].min().reset_index()
t1.merge(t2,on=['subject_id','date'],how='inner',suffixes=('_max', '_min'))

The output should look like given below. Though my code works, it isn't elegant I feel. Is there any other way to write the max and min in one line?

enter image description here

Upvotes: 1

Views: 86

Answers (1)

jezrael
jezrael

Reputation: 862601

Use GroupBy.agg with tuples for new columns names and aggregate functions:

df = (op1.groupby(['subject_id','date'])['val']
         .agg([('val_max', 'max'),('val_min', 'min')])
         .reset_index())
print (df)
   subject_id      date  val_max  val_min
0           1  1/1/2017       11        5
1           1  1/2/2017       10        5
2           1  1/3/2017       16       11
3           2  1/4/2017       26       21
4           2  1/5/2017        8        5
5           2  1/6/2017       11        3

In pandas 0.25+ is possible use named aggregation:

df = (op1.groupby(['subject_id','date'])
         .agg(val_min=pd.NamedAgg(column='val', aggfunc='min'),
              val_max=pd.NamedAgg(column='val', aggfunc='max'))
         .reset_index())

Upvotes: 7

Related Questions