Jeff
Jeff

Reputation: 8421

How to set aggregation after groupby?

Given that i have a dataset as below:

dt = {
    "facility":["Ann Arbor","Ann Arbor","Detriot","Detriot","Detriot"],
    "patient_ID":[4388,4388,9086,9086,9086],
    "year":[2004,2007,2007,2008,2011],
    "month":[8,9,9,6,2],
    "Nr_Small":[0,0,5,12,10],
    "Nr_Medium":[3,1,1,4,3],
    "Nr_Large":[2,0,0,0,0],
    "PeriodBetween2Visits" : [10,0,12,3,1],
    "NumberOfVisits" : [2,2,3,3,3]

}

dt = pd.DataFrame(dt)

i need to keep groupby patient_ID, then keep the facility,patient_ID,NumberOfVisits, but Maximum and minimum of PeriodBetween2Visits.

Here is what i tried:

dt = dt.groupby(['patient_ID'],as_index=False)["facility","patient_ID","PeriodBetween2Visits","NumberOfVisits"].agg({'PeriodBetween2Visits': ['min', 'max']})


dt.head()

But, it is not what i need!

A proper output for me as below:

enter image description here

Upvotes: 2

Views: 51

Answers (1)

anky
anky

Reputation: 75080

I am using named aggregation here which was inbuilt in groupby and agg recently:

 dt.groupby(['facility','patient_ID']).agg(
                 Min_PeriodBetween2Visits=('PeriodBetween2Visits','min'),
                 Max_PeriodBetween2Visits=('PeriodBetween2Visits','max'),
                 NumberOfVisits=('NumberOfVisits','nunique')).reset_index()

    facility  patient_ID  Min_PeriodBetween2Visits  Max_PeriodBetween2Visits  \
0  Ann Arbor        4388                         0                        10   
1    Detriot        9086                         1                        12   

   NumberOfVisits  
0               2  
1               3  

Upvotes: 2

Related Questions