Reputation: 8421
Given that i have a dataset as below:
dt = {
"facility":["Ann Arbor","Ann Arbor","Detriot","Detriot","Detriot"],
"patient_ID":[4388,4388,9086,9086,9086],
"year":[2004,2007,2007,2008,2011],
"month":[8,9,9,6,2],
"Nr_Small":[0,0,5,12,10],
"Nr_Medium":[3,1,1,4,3],
"Nr_Large":[2,0,0,0,0],
"PeriodBetween2Visits" : [10,0,12,3,1],
"NumberOfVisits" : [2,2,3,3,3]
}
dt = pd.DataFrame(dt)
i need to keep groupby patient_ID
, then keep the facility
,patient_ID
,NumberOfVisits
, but Maximum and minimum of PeriodBetween2Visits
.
Here is what i tried:
dt = dt.groupby(['patient_ID'],as_index=False)["facility","patient_ID","PeriodBetween2Visits","NumberOfVisits"].agg({'PeriodBetween2Visits': ['min', 'max']})
dt.head()
But, it is not what i need!
A proper output for me as below:
Upvotes: 2
Views: 51
Reputation: 75080
I am using named aggregation here which was inbuilt in groupby and agg
recently:
dt.groupby(['facility','patient_ID']).agg(
Min_PeriodBetween2Visits=('PeriodBetween2Visits','min'),
Max_PeriodBetween2Visits=('PeriodBetween2Visits','max'),
NumberOfVisits=('NumberOfVisits','nunique')).reset_index()
facility patient_ID Min_PeriodBetween2Visits Max_PeriodBetween2Visits \
0 Ann Arbor 4388 0 10
1 Detriot 9086 1 12
NumberOfVisits
0 2
1 3
Upvotes: 2