Group and randomize data based on condition

Question

I need help with this. I want to randomize a set of data like the one in the input. The output will have the time column remain the same but the data that is not zero in the "S" column will be randomly distributed but we keep the number sequence. That is, in the example in the input, we randomly distribute these groups of data (300, 200, 325, 411) and (450,346,250).

The Input:

time	S
2:30	0
2:35	0
2:40	300
2:45	200
2:50	325
2:55	411
3:00	0
3:05	0
3:10	450
3:15	346
3:20	250
3:25	0
3:30	0

This is what I was thinking...

Steps:

Grouping the non-zero sequence : This is the part I can't quite figure out
Randomizing the group

import random

random.shuffle(groups)

Note: Please if you think there's another way to approach the problem I'm all ears.

Possible Output:

time	S
2:30	300
2:35	200
2:40	325
2:45	411
2:50	0
2:55	0
3:00	450
3:05	346
3:10	250
3:15	0
3:20	0
3:25	0
3:30	0

jezrael · Accepted Answer

Idea is create consecutive groups by compare by 0 and chain shifted values by | for regex OR with Series.cumsum and then change order by random values:

m = df.S.eq(0)
s = (m | m.shift()).cumsum()
ids = s.unique()
np.random.shuffle(ids)
df = df.set_index(s).loc[ids].reset_index(drop=True)

print (df)
    time    S
0   2:30    0
1   2:40  300
2   2:45  200
3   2:50  325
4   2:55  411
5   3:25    0
6   3:10  450
7   3:15  346
8   3:20  250
9   3:05    0
10  2:35    0
11  3:00    0
12  3:30    0

Group and randomize data based on condition

Answers (1)

Related Questions