Reputation: 1428
I have a data frame with the following structure:
>>>df
name threshold ... time
0 a no ... 1.1
1 a 1 ... 1.5
2 b no ... 1.1
3 a 2 ... 1.5
...
For each name (groupby), I'd like to find df.where['threshold']=='no'
and divide the corresponding value of time
to the rest of the name
in the same group (a, b, etc.). I'd like to preserve the rest of the dataframe as it was. I was not able to find an option to do so with df.apply:
df.groupby(['name']).apply(lambda x: x['threshold'])
After which, I can't apply df.where
on it and I can't quite make this multiple conditions with df.apply.
So the answer should do a groupby
, apply
by threshold, where
threshold is no, find
corresponding time value and divide
that to the all of the names in the same group. Note that there is only one no
per each group name.
Thanks for any suggestions.
Upvotes: 0
Views: 27
Reputation: 61930
IIUC, you could do:
df['no_time'] = df['threshold'].eq('no') * df['time']
df['time'] = df['time'] / df.groupby('name')['no_time'].transform('max')
res = df.drop('no_time', axis=1)
print(res)
Output
name threshold time
0 a no 1.000000
1 a 1 1.363636
2 b no 1.000000
3 a 2 1.363636
The first step:
df['no_time'] = df['threshold'].eq('no') * df['time']
creates a new column where the only values different than 0
are where threshold equals no.
The second step has two parts, the part 2.1
df.groupby('name')['no_time'].transform('max')
finds the maximum of the new column (no_time
) by group i.e. the values of time where the threshold equals no. Assuming time is always positive (or at least where threshold equals no)
The final part just divide the df['time']
column by the one from the previous step (2.1)
Upvotes: 1