Reputation: 125
In the following dataframa I would like to substract the max value of u
from each user to the value of u
with the corresponding row of the max value of t
. So it should be like 21 (u-max) - 18 (u-value of t-max).
The dataframe is grouped by ['user','t']
user t u
1 0.0 -1.14
2.30 2.8
2.37 9.20
2.40 21
2.45 18
2 ... ...
If t
wasn't part of the index, I would have used something like df.groupby().agg({'u':'max'})
and df.groupby().agg({'t':'max'})
,but since it isn't, I don't know how I could use agg()
on t
(edit)
I found out that I can use df.reset_index(level=['t'], inplace=True)
to change t into a column, but now I realise that if I would use
df.groupby(['user']).agg({"t":'max'})
, that the corresponding u
values would be missing
The goal is to create a new dataframe that contains the values like this:
user (U_max - U_tmax)
1 3
2 ...
Upvotes: 1
Views: 371
Reputation: 2304
Let's start by re-creating a dataframe similar to yours, with the below code:
import pandas as pd
import numpy as np
cols = ['user', 't', 'u']
df = pd.DataFrame(columns=cols)
size = 10
df['user'] = np.random.randint(1,3, size=size)
df['t'] = np.random.uniform(0.0,3.0, size=size)
df = df.groupby(['user','t']).sum()
df['u'] = np.random.randint(-30,30, size=len(df))
print(df)
The output is something like:
u
user t
1 0.545562 19
0.627296 23
0.945533 -13
1.697278 -18
1.904453 -10
2.008375 5
2.296342 -2
2 0.282291 14
1.461548 -6
2.594966 -19
The first thing we'll need to do in order to work on this df is to reset the index, so:
df = df.reset_index()
Now we have all our columns back and we can use them to apply our final groupby()
function.
We can start by grouping by user
, which is what we need, specifying u
and t
as columns, so that we can access them in a lambda function.
In this lambda function, we will subtract from the max
value of u
and the corresponding u
value for the max
value of t
.
So, the max value of u
must be something like:
x['u'].max()
And the u
value of max
of t
should look like:
x['u'][x['t'].idxmax()])
So as you can see we've found the index for the max value of t
, and used it to slice x['u']
.
Here is the final code:
df = df.reset_index()
df = df.groupby(['user'])['u', 't'].apply(lambda x: (x['u'].max() - x['u'][x['t'].idxmax()]) )
print(df)
Final output:
user
1 25
2 33
Gross Error Check:
max
of u
for user 1
is 23max
of t
for user 1
is 2.296342
, and the corresponding u
is -223 - (-2)
= 25
max
of u
for user 2
is 14max
of t
for user 2
is 2.594966
, and the corresponding u
is -1914 - (-19)
= 33
Bonus tip: If you'd like to rename the returned column from groupby
, use reset_index()
along with set_index()
after the groupby operation:
df = df.reset_index(name='(U_max - U_tmax)').set_index('user')
It will yield:
(U_max - U_tmax)
user
1 25
2 33
Upvotes: 1