Reputation: 4292
I have a large pandas dataframe that can be represented structurally as:
id date status
0 12 2015-05-01 0
1 12 2015-05-22 1
2 12 2015-05-14 1
3 12 2015-05-06 0
4 45 2015-05-03 1
5 45 2015-05-12 1
6 45 2015-05-02 0
7 51 2015-05-05 1
8 51 2015-05-01 0
9 51 2015-05-23 1
10 51 2015-05-17 1
11 51 2015-05-03 0
12 51 2015-05-05 0
13 76 2015-05-04 1
14 76 2015-05-22 1
15 76 2015-05-08 0
And can be created in Python 3.4 using:
tempDF = pd.DataFrame({ 'id': [12,12,12,12,45,45,45,51,51,51,51,51,51,76,76,76],
'date': ['2015-05-01','2015-05-22','2015-05-14','2015-05-06','2015-05-03','2015-05-12','2015-05-02','2015-05-05','2015-05-01','2015-05-23','2015-05-17','2015-05-03','2015-05-05','2015-05-04','2015-05-22','2015-05-08'],
'status': [0,1,1,0,1,1,0,1,0,1,1,0,0,1,1,0]})
tempDF['date'] = pd.to_datetime(tempDF['date'])
I would like to divide the dataframe into groups based on variable 'id', sort within groups based on 'date' and then get the last 'status' value within each group.
So far, I have:
tempGrouped = tempDF.groupby('id')
tempGrouped['status'].last()
which produces:
id
12 0
45 0
51 0
76 0
However, the status should be 1 in each case (the value associated with the latest date). I can't work out how to sort the groups by date before selecting the last value. It's likely I'm a little snow-blind after trying to work this out for a while, so I apologise in advance if the solution is obvious.
Upvotes: 5
Views: 13555
Reputation: 1181
you can sort and group like this :
tempDF.sort(['id','date']).groupby('id')['status'].last()
Upvotes: 14