Add a column of normalised values based on sections of a dataframe column

Question

I have got a dataframe of several hundred thousand rows. Which is of the following format:

   time_elapsed  cycle
0          0.00      1
1          0.50      1
2          1.00      1
3          1.30      1
4          1.50      1
5          0.00      2
6          0.75      2
7          1.50      2
8          3.00      2

I want to create a third column that will give me the percentage of each time instance that the row is of the cycle (until the next time_elapsed = 0). To give something like:

   time_elapsed  cycle  percentage
0          0.00      1           0
1          0.50      1          33
2          1.00      1          75
3          1.30      1          87
4          1.50      1         100
5          0.00      2           0
6          0.75      2          25
7          1.50      2          50
8          3.00      2         100

I'm not fussed about the number of decimal places, I've just excluded them for ease here.

I started going along this route, but I keep getting errors.

data['percentage'] = data['time_elapsed'].sub(data.groupby(['cycle'])['time_elapsed'].transform(lambda x: x*100/data['time_elapsed'].max()))

I think it's the lambda function causing errors, but I'm not sure what I should do to change it. Any help is much appreciated :)

jezrael · Accepted Answer

Use Series.div for division instead sub for subtract, then solution is simplify - get only max per groups, multiple by Series.mul, if necessary Series.round and last convert to integers by Series.astype:

s = data.groupby(['cycle'])['time_elapsed'].transform('max')
data['percentage'] = data['time_elapsed'].div(s).mul(100).round().astype(int)
print (data)
   time_elapsed  cycle  percentage
0          0.00      1           0
1          0.50      1          33
2          1.00      1          67
3          1.30      1          87
4          1.50      1         100
5          0.00      2           0
6          0.75      2          25
7          1.50      2          50
8          3.00      2         100

Add a column of normalised values based on sections of a dataframe column

Answers (1)

Related Questions