harry04
harry04

Reputation: 962

Working on a subset of a dataframe with column condition

From a dataframe df I want to update the value of a column Points for the top 3 values of another column Time after sorting the Time column in ascending order, such that

df['Points'] = df['Points'] * 1.3 for the first row (smallest Time)

df['Points'] = df['Points'] * 1.2 for the second row (second smallest Time)

df['Points'] = df['Points'] * 1.1 for the third row (third smallest Time) rounded to the nearest integer.

and Points for all other rows remains the same.

I have to do this for every unique value for a third column value Challenge. How can I do this?

So, I need PointsA instead of Points from below -

Challenge      Team              Time              Points   PointsA 
   A             1    2019-11-05 23:00:43.07589     200       260
   B             3    2019-11-05 22:10:55.07589     100       130
   A             5    2019-11-05 23:05:43.07589     200       240
   A             7    2019-11-05 23:07:33.07589     200       220
   B            10    2019-11-05 22:20:13.07589     100       120
   C             4    2019-11-06 00:05:22.07589      50        65
   A             4    2019-11-05 23:18:23.07589     200       200

I've tried something like -

for challenge in df['Challenge'].unique():
     df[df['Challenge'] == challenge].sort_values('Time', ascending=True).head(1)['Points'] *= 1.3

but that doesn't seem to work.

Upvotes: 2

Views: 298

Answers (2)

Andy L.
Andy L.

Reputation: 25239

Try this. Use value_counts and items to get each challenge and length of them. Use these length to narrow on assignment of challenge

val = [1.3, 1.2, 1.1]
df.Time = pd.to_datetime(df.Time)
for challenge, i in df['Challenge'].value_counts().items():
    df.loc[df[df['Challenge'] == challenge].nsmallest(3, 'Time').index, 'Points'] *= val[:i]

Out[201]:
  Challenge  Team                       Time  Points  PointsA
0         A     1 2019-11-05 23:00:43.075890   260.0       260
1         B     3 2019-11-05 22:10:55.075890   130.0       130
2         A     5 2019-11-05 23:05:43.075890   240.0       240
3         A     7 2019-11-05 23:07:33.075890   220.0       220
4         B    10 2019-11-05 22:20:13.075890   120.0       120
5         C     4 2019-11-06 00:05:22.075890    65.0        65
6         A     4 2019-11-05 23:18:23.075890   200.0       200

As Challenge = 'C' has one row and it got calculated correctly from 50 to 65

Upvotes: 1

fmarm
fmarm

Reputation: 4284

Here is a way to do it

import pandas as pd
import numpy as np

# compute rank by challenge
df['rank_in_challenge'] = df.groupby('Challenge')['Time'].rank(method='first',ascending=True).astype('int')
# apply change in points
conditions  = [ df['rank_in_challenge']==1,df['rank_in_challenge']==2,df['rank_in_challenge']==3]
choices     = [ 1.3, 1.2, 1.1 ]
df["PointsA"] = np.select(conditions, choices, default=1.0)*df['Points'] 

Upvotes: 1

Related Questions