find max value for index in dataframe and latest value

Question

In the following dataframa I would like to substract the max value of u from each user to the value of u with the corresponding row of the max value of t. So it should be like 21 (u-max) - 18 (u-value of t-max). The dataframe is grouped by ['user','t']

user  t      u
1     0.0    -1.14
      2.30   2.8
      2.37   9.20
      2.40   21
      2.45   18
2     ...    ...

If t wasn't part of the index, I would have used something like df.groupby().agg({'u':'max'}) and df.groupby().agg({'t':'max'}) ,but since it isn't, I don't know how I could use agg()on t

(edit) I found out that I can use df.reset_index(level=['t'], inplace=True) to change t into a column, but now I realise that if I would use df.groupby(['user']).agg({"t":'max'}) , that the corresponding u values would be missing

The goal is to create a new dataframe that contains the values like this:

user    (U_max - U_tmax) 
1        3
2        ...

BiOS · Accepted Answer

Let's start by re-creating a dataframe similar to yours, with the below code:

import pandas as pd
import numpy as np

cols =  ['user', 't', 'u']
df  = pd.DataFrame(columns=cols)
size = 10

df['user'] = np.random.randint(1,3, size=size)
df['t'] = np.random.uniform(0.0,3.0, size=size)

df = df.groupby(['user','t']).sum()
df['u'] = np.random.randint(-30,30, size=len(df))

print(df)

The output is something like:

                u
user t           
1    0.545562  19
     0.627296  23
     0.945533 -13
     1.697278 -18
     1.904453 -10
     2.008375   5
     2.296342  -2
2    0.282291  14
     1.461548  -6
     2.594966 -19

The first thing we'll need to do in order to work on this df is to reset the index, so:

df = df.reset_index()

Now we have all our columns back and we can use them to apply our final groupby() function.

We can start by grouping by user, which is what we need, specifying u and t as columns, so that we can access them in a lambda function.

In this lambda function, we will subtract from the max value of u and the corresponding u value for the max value of t.

So, the max value of u must be something like:

x['u'].max()

And the u value of max of t should look like:

x['u'][x['t'].idxmax()])

So as you can see we've found the index for the max value of t, and used it to slice x['u'].

Here is the final code:

df = df.reset_index()

df = df.groupby(['user'])['u', 't'].apply(lambda x: (x['u'].max() - x['u'][x['t'].idxmax()]) )

print(df)

Final output:

user
1    25
2    33

Gross Error Check:

max of u for user 1 is 23
max of t for user 1 is 2.296342, and the corresponding u is -2

23 - (-2) = 25

max of u for user 2 is 14
max of t for user 2 is 2.594966, and the corresponding u is -19

14 - (-19) = 33

Bonus tip: If you'd like to rename the returned column from groupby, use reset_index() along with set_index() after the groupby operation:

df = df.reset_index(name='(U_max - U_tmax)').set_index('user')

It will yield:

      (U_max - U_tmax)
user                  
1                   25
2                   33

find max value for index in dataframe and latest value

Answers (1)

Related Questions