Reputation: 85
I need help with this. I want to randomize a set of data like the one in the input. The output will have the time column remain the same but the data that is not zero in the "S" column will be randomly distributed but we keep the number sequence. That is, in the example in the input, we randomly distribute these groups of data (300, 200, 325, 411) and (450,346,250).
The Input:
time | S |
---|---|
2:30 | 0 |
2:35 | 0 |
2:40 | 300 |
2:45 | 200 |
2:50 | 325 |
2:55 | 411 |
3:00 | 0 |
3:05 | 0 |
3:10 | 450 |
3:15 | 346 |
3:20 | 250 |
3:25 | 0 |
3:30 | 0 |
This is what I was thinking...
Steps:
Grouping the non-zero sequence : This is the part I can't quite figure out
Randomizing the group
import random
random.shuffle(groups)
Note: Please if you think there's another way to approach the problem I'm all ears.
Possible Output:
time | S |
---|---|
2:30 | 300 |
2:35 | 200 |
2:40 | 325 |
2:45 | 411 |
2:50 | 0 |
2:55 | 0 |
3:00 | 450 |
3:05 | 346 |
3:10 | 250 |
3:15 | 0 |
3:20 | 0 |
3:25 | 0 |
3:30 | 0 |
Upvotes: 1
Views: 201
Reputation: 863226
Idea is create consecutive groups by compare by 0
and chain shifted values by |
for regex OR
with Series.cumsum
and then change order by random values:
m = df.S.eq(0)
s = (m | m.shift()).cumsum()
ids = s.unique()
np.random.shuffle(ids)
df = df.set_index(s).loc[ids].reset_index(drop=True)
print (df)
time S
0 2:30 0
1 2:40 300
2 2:45 200
3 2:50 325
4 2:55 411
5 3:25 0
6 3:10 450
7 3:15 346
8 3:20 250
9 3:05 0
10 2:35 0
11 3:00 0
12 3:30 0
Upvotes: 1