Reputation: 184
I have got a dataframe of several hundred thousand rows. Which is of the following format:
time_elapsed cycle
0 0.00 1
1 0.50 1
2 1.00 1
3 1.30 1
4 1.50 1
5 0.00 2
6 0.75 2
7 1.50 2
8 3.00 2
I want to create a third column that will give me the percentage of each time instance that the row is of the cycle (until the next time_elapsed = 0). To give something like:
time_elapsed cycle percentage
0 0.00 1 0
1 0.50 1 33
2 1.00 1 75
3 1.30 1 87
4 1.50 1 100
5 0.00 2 0
6 0.75 2 25
7 1.50 2 50
8 3.00 2 100
I'm not fussed about the number of decimal places, I've just excluded them for ease here.
I started going along this route, but I keep getting errors.
data['percentage'] = data['time_elapsed'].sub(data.groupby(['cycle'])['time_elapsed'].transform(lambda x: x*100/data['time_elapsed'].max()))
I think it's the lambda function causing errors, but I'm not sure what I should do to change it. Any help is much appreciated :)
Upvotes: 1
Views: 37
Reputation: 862731
Use Series.div
for division instead sub
for subtract, then solution is simplify - get only max
per groups, multiple by Series.mul
, if necessary Series.round
and last convert to integers by Series.astype
:
s = data.groupby(['cycle'])['time_elapsed'].transform('max')
data['percentage'] = data['time_elapsed'].div(s).mul(100).round().astype(int)
print (data)
time_elapsed cycle percentage
0 0.00 1 0
1 0.50 1 33
2 1.00 1 67
3 1.30 1 87
4 1.50 1 100
5 0.00 2 0
6 0.75 2 25
7 1.50 2 50
8 3.00 2 100
Upvotes: 3