codemastro
codemastro

Reputation: 85

Group and randomize data based on condition

I need help with this. I want to randomize a set of data like the one in the input. The output will have the time column remain the same but the data that is not zero in the "S" column will be randomly distributed but we keep the number sequence. That is, in the example in the input, we randomly distribute these groups of data (300, 200, 325, 411) and (450,346,250).

The Input:

time S
2:30 0
2:35 0
2:40 300
2:45 200
2:50 325
2:55 411
3:00 0
3:05 0
3:10 450
3:15 346
3:20 250
3:25 0
3:30 0

This is what I was thinking...

Steps:

  1. Grouping the non-zero sequence : This is the part I can't quite figure out

  2. Randomizing the group

import random

random.shuffle(groups)

Note: Please if you think there's another way to approach the problem I'm all ears.

Possible Output:

time S
2:30 300
2:35 200
2:40 325
2:45 411
2:50 0
2:55 0
3:00 450
3:05 346
3:10 250
3:15 0
3:20 0
3:25 0
3:30 0

Upvotes: 1

Views: 201

Answers (1)

jezrael
jezrael

Reputation: 863226

Idea is create consecutive groups by compare by 0 and chain shifted values by | for regex OR with Series.cumsum and then change order by random values:

m = df.S.eq(0)
s = (m | m.shift()).cumsum()
ids = s.unique()
np.random.shuffle(ids)
df = df.set_index(s).loc[ids].reset_index(drop=True)

print (df)
    time    S
0   2:30    0
1   2:40  300
2   2:45  200
3   2:50  325
4   2:55  411
5   3:25    0
6   3:10  450
7   3:15  346
8   3:20  250
9   3:05    0
10  2:35    0
11  3:00    0
12  3:30    0

Upvotes: 1

Related Questions