user1718097
user1718097

Reputation: 4292

Sorting Pandas dataframe data within Groupby groups

I have a large pandas dataframe that can be represented structurally as:

      id          date    status
0     12    2015-05-01         0
1     12    2015-05-22         1
2     12    2015-05-14         1
3     12    2015-05-06         0
4     45    2015-05-03         1
5     45    2015-05-12         1
6     45    2015-05-02         0
7     51    2015-05-05         1
8     51    2015-05-01         0
9     51    2015-05-23         1
10    51    2015-05-17         1
11    51    2015-05-03         0
12    51    2015-05-05         0
13    76    2015-05-04         1
14    76    2015-05-22         1
15    76    2015-05-08         0

And can be created in Python 3.4 using:

tempDF = pd.DataFrame({ 'id': [12,12,12,12,45,45,45,51,51,51,51,51,51,76,76,76],
                        'date': ['2015-05-01','2015-05-22','2015-05-14','2015-05-06','2015-05-03','2015-05-12','2015-05-02','2015-05-05','2015-05-01','2015-05-23','2015-05-17','2015-05-03','2015-05-05','2015-05-04','2015-05-22','2015-05-08'],
                        'status': [0,1,1,0,1,1,0,1,0,1,1,0,0,1,1,0]})
tempDF['date'] = pd.to_datetime(tempDF['date'])

I would like to divide the dataframe into groups based on variable 'id', sort within groups based on 'date' and then get the last 'status' value within each group.

So far, I have:

tempGrouped = tempDF.groupby('id')
tempGrouped['status'].last()

which produces:

id
12    0
45    0
51    0
76    0

However, the status should be 1 in each case (the value associated with the latest date). I can't work out how to sort the groups by date before selecting the last value. It's likely I'm a little snow-blind after trying to work this out for a while, so I apologise in advance if the solution is obvious.

Upvotes: 5

Views: 13555

Answers (1)

steboc
steboc

Reputation: 1181

you can sort and group like this :

tempDF.sort(['id','date']).groupby('id')['status'].last()

Upvotes: 14

Related Questions