BENY
BENY

Reputation: 323306

How is the different between pd.groupby().first() with pd.groupby().min()?

Guys I have a Dataframe

df= pd.DataFrame({'Point_ID':[1,2,3,1,2,1] , 'Shape_ID': [84,85,86,87,88,89],'LOL':[0,1,0,1,np.nan,np.nan]})

Out[1116]: 
   LOL  Point_ID  Shape_ID
0  0.0         1        84
1  1.0         2        85
2  0.0         3        86
3  1.0         1        87
4  NaN         2        88
5  NaN         1        89

When I did :

df.groupby('Point_ID').last()
Out[1114]: 
          LOL  Shape_ID
Point_ID               
1         1.0        89
2         1.0        88
3         0.0        86

On Shape_ID it returned the last value , but on LOL should it return NaN ?

By using max, I get the same answer as I using last() when the Dataframe is sorted

df.groupby('Point_ID').max()

Out[1115]: 
          LOL  Shape_ID
Point_ID               
1         1.0        89
2         1.0        88
3         0.0        86

I am reading the pandas file about the both function first and last, can not find the answer. Is there anyone can help ? Much appreciate~~:-)

Upvotes: 2

Views: 107

Answers (2)

Vaishali
Vaishali

Reputation: 38415

Its just returning all the values corresponding to the last value of point_Id.

Consider this df in which I added a row to your sample

    LOL Point_ID    Shape_ID
0   0   1           84
1   0   2           85
2   0   3           86
3   1   1           87
4   0   2           88
5   -1  1           89
6   1   2           25

If you groupby

df.groupby('Point_ID').last()

You get

        LOL Shape_ID
Point_ID        
1       2   25
2       0   88
3       0   86

Here the value in LOL happens to be the max but its not max, just the value of LOL corresponding to the last row with point_id 1

Do go through this github issue on the same, it says for the moment skipping NaN is a feature of first/last. If you don't want that behaviour, use nth with dropna = False

df.groupby('Point_ID').nth(-1,dropna=False)

        LOL Shape_ID
Point_ID        
1       NaN 89
2       NaN 88
3       0.0 86

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210872

Demo:

let's shuffle your DF:

In [339]: df = df.sample(frac=1)

In [340]: df
Out[340]:
   LOL  Point_ID  Shape_ID
4    0         2        88
0    0         1        84
1    0         2        85
3    1         1        87
2    0         3        86
5   -1         1        89

In [341]: df.groupby('Point_ID').min()
Out[341]:
          LOL  Shape_ID
Point_ID
1          -1        84
2           0        85  #  <----
3           0        86

In [342]: df.groupby('Point_ID').first()
Out[342]:
          LOL  Shape_ID
Point_ID
1           0        84
2           0        88  #  <----
3           0        86

Upvotes: 2

Related Questions