Reputation: 4774
The dataFrame
looks something like this: Name of person and weight at a given date.
Name date w
1 Mike 2019-01-21 89.1
2 Mike 2018-11-12 88.1
3 Mike 2018-03-14 87.2
4 Hans 2019-03-21 66.5
5 Hans 2018-03-12 57.4
6 Hans 2017-04-21 55.3
7 Hans 2016-10-12 nan
I want to select the last time Hans
has logged in his weight. So the answer would be
4 Hans 2019-03-21 66.5
Here's what I successfully managed to do:
# select Hans data that don't have nans
cond = ( data['Name'] == 'Hans' )
a = data.loc[ cond ]
a = a.dropna()
# get the index of the most recent weight
b = d['date'].str.split('-', expand=True) # split the date to get the year
now b
looks like this
print(b)
#4 2019 03 21
#5 2018 03 12
#6 2017 04 21
how can I extract the row with index=4
and then get the weight?
I cannot use idxmax
because the df
are not floats
but str
.
Upvotes: 1
Views: 36
Reputation: 402263
You cannot use idxmax
, but a workaround is to use NumPy's argmax
with iloc
:
df2 = df.query('Name == "Hans"')
# older versions
# df2.iloc[[df['date'].values.argmax()]]
# >=0.24
df2.iloc[[df['date'].to_numpy().argmax()]]
Name date w
4 Hans 2019-03-21 66.5
Another trick is to convert the date to integer using to_datetime
. You can then use idxmax
with loc
as usual.
df2.loc[[pd.to_datetime(df2['date']).astype(int).idxmax()]]
Name date w
4 Hans 2019-03-21 66.5
To do this for each person, use GroupBy.idxmax
:
df.iloc[pd.to_datetime(df.date).astype(int).groupby(df['Name']).idxmax().values]
Name date w
5 Hans 2018-03-12 57.4
2 Mike 2018-11-12 88.1
Upvotes: 1