Reputation: 323306
Guys I have a Dataframe
df= pd.DataFrame({'Point_ID':[1,2,3,1,2,1] , 'Shape_ID': [84,85,86,87,88,89],'LOL':[0,1,0,1,np.nan,np.nan]})
Out[1116]:
LOL Point_ID Shape_ID
0 0.0 1 84
1 1.0 2 85
2 0.0 3 86
3 1.0 1 87
4 NaN 2 88
5 NaN 1 89
When I did :
df.groupby('Point_ID').last()
Out[1114]:
LOL Shape_ID
Point_ID
1 1.0 89
2 1.0 88
3 0.0 86
On Shape_ID
it returned the last value , but on LOL
should it return NaN
?
By using max
, I get the same answer as I using last()
when the Dataframe is sorted
df.groupby('Point_ID').max()
Out[1115]:
LOL Shape_ID
Point_ID
1 1.0 89
2 1.0 88
3 0.0 86
I am reading the pandas file about the both function first
and last
, can not find the answer.
Is there anyone can help ? Much appreciate~~:-)
Upvotes: 2
Views: 107
Reputation: 38415
Its just returning all the values corresponding to the last value of point_Id.
Consider this df in which I added a row to your sample
LOL Point_ID Shape_ID
0 0 1 84
1 0 2 85
2 0 3 86
3 1 1 87
4 0 2 88
5 -1 1 89
6 1 2 25
If you groupby
df.groupby('Point_ID').last()
You get
LOL Shape_ID
Point_ID
1 2 25
2 0 88
3 0 86
Here the value in LOL happens to be the max but its not max, just the value of LOL corresponding to the last row with point_id 1
Do go through this github issue on the same, it says for the moment skipping NaN is a feature of first/last. If you don't want that behaviour, use nth with dropna = False
df.groupby('Point_ID').nth(-1,dropna=False)
LOL Shape_ID
Point_ID
1 NaN 89
2 NaN 88
3 0.0 86
Upvotes: 2
Reputation: 210872
Demo:
let's shuffle your DF:
In [339]: df = df.sample(frac=1)
In [340]: df
Out[340]:
LOL Point_ID Shape_ID
4 0 2 88
0 0 1 84
1 0 2 85
3 1 1 87
2 0 3 86
5 -1 1 89
In [341]: df.groupby('Point_ID').min()
Out[341]:
LOL Shape_ID
Point_ID
1 -1 84
2 0 85 # <----
3 0 86
In [342]: df.groupby('Point_ID').first()
Out[342]:
LOL Shape_ID
Point_ID
1 0 84
2 0 88 # <----
3 0 86
Upvotes: 2