aloha
aloha

Reputation: 4774

select a row based on the highest value of a column python

The dataFrame looks something like this: Name of person and weight at a given date.

  Name   date      w
1 Mike 2019-01-21 89.1
2 Mike 2018-11-12 88.1
3 Mike 2018-03-14 87.2
4 Hans 2019-03-21 66.5
5 Hans 2018-03-12 57.4
6 Hans 2017-04-21 55.3
7 Hans 2016-10-12 nan

I want to select the last time Hans has logged in his weight. So the answer would be

4 Hans 2019-03-21 66.5

Here's what I successfully managed to do:

# select Hans data that don't have nans
cond = ( data['Name'] == 'Hans' )
a = data.loc[ cond ] 
a = a.dropna()       

# get the index of the most recent weight
b = d['date'].str.split('-', expand=True) # split the date to get the year

now b looks like this

print(b)
#4 2019 03 21
#5 2018 03 12
#6 2017 04 21

how can I extract the row with index=4 and then get the weight?

I cannot use idxmax because the df are not floats but str.

Upvotes: 1

Views: 36

Answers (1)

cs95
cs95

Reputation: 402263

You cannot use idxmax, but a workaround is to use NumPy's argmax with iloc:

df2 = df.query('Name == "Hans"')
# older versions
# df2.iloc[[df['date'].values.argmax()]]
# >=0.24
df2.iloc[[df['date'].to_numpy().argmax()]]

   Name        date     w
4  Hans  2019-03-21  66.5

Another trick is to convert the date to integer using to_datetime. You can then use idxmax with loc as usual.

df2.loc[[pd.to_datetime(df2['date']).astype(int).idxmax()]]

   Name        date     w
4  Hans  2019-03-21  66.5

To do this for each person, use GroupBy.idxmax:

df.iloc[pd.to_datetime(df.date).astype(int).groupby(df['Name']).idxmax().values]

   Name        date     w
5  Hans  2018-03-12  57.4
2  Mike  2018-11-12  88.1

Upvotes: 1

Related Questions