Shuffling rows in pandas but orderly

Question

Let's say that I have a data frame of three columns: age, gender, and country.

I want to randomly shuffle this data but in an ordered fashion according to gender. There are n males and m females, where n could be less than, greater than, or equal to m. The shuffling should happen in such a way that we get the following results for a size of 8 people:

male, female, male, female, male, female, female, female,.... (if there are more females: m > n) male, female, male, female, male, male, male, male (if there are more males: n > m) male, female, male, female, male, female, male, female, male, female (if equal males and females: n = m)

df = pd.DataFrame({'Age': [10, 20, 30, 40, 50, 60, 70, 80],
                   'Gender': ["Male", "Male", "Male", "Female", "Female", "Male", "Female", "Female"], 
'Country': ["US", "UK", "China", "Canada", "US", "UK", "China", "Brazil"]})

John Zwinck · Accepted Answer

First add the sequence numbers within each group:

df['Order'] = df.groupby('Gender').cumcount()

Then sort:

df.sort_values('Order')

It gives you:

   Age  Gender Country  Order
0   10    Male      US      0
3   40  Female  Canada      0
1   20    Male      UK      1
4   50  Female      US      1
2   30    Male   China      2
6   70  Female   China      2
5   60    Male      UK      3
7   80  Female  Brazil      3

If you want to shuffle, do that at the very beginning, e.g. df = df.sample(frac=1), see: Shuffle DataFrame rows

Shuffling rows in pandas but orderly

Answers (2)

Related Questions