Linter
Linter

Reputation: 127

Create random groupings from list

I need to take a list of over 500 people and place them into groups of 15. The groups should be randomized so that we don't end up with groups where everyone's last name begins with "B", for example. But I also need to balance the groups of 15 for gender parity as close as possible. The list is in a 'students.csv' file with this structure:


Last, First, ID, Sport, Gender, INT
James, Frank, f99087, FOOT, m, I
Smith, Sally, f88329, SOC, f, 
Cranston, Bill, f64928, ,m,

I was looking for some kind of solution in pandas, but I have limited coding knowledge. The code I've got so far just explores the data a bit.

import pandas as pd
data = pd.read_csv('students.csv', index_col='ID')
print(data)

print(data.Gender.value_counts())

Upvotes: 0

Views: 1588

Answers (2)

user10325516
user10325516

Reputation:

Approach using pandas means - groups of 15 members. The rest are in the very last group. Gender ratio is kinda the same at the accuracy as pandas randomizer allows.

import pandas as pd

df = pd.read_csv('1.csv', skipinitialspace=True) # 1.csv contains sample data from the question

# shuffle data / pandas way
df = df.sample(frac=1).reset_index(drop=True)

# group size
SIZE = 15

# create column with group number
df['group'] = df.index // SIZE

# list of groups, groups[0] is dataframe with the first group members
groups = [
    df[df['group'] == num]
    for num in range(df['group'].max() + 1)]

Save dataframe to file:

# one csv-file
df.to_csv('2.csv')

# many csv-files
for num, group_df in enumerate(groups, 1):
    group_df.to_csv('group_{}.csv'.format(num))

Upvotes: 0

Green Cloak Guy
Green Cloak Guy

Reputation: 24691

First thing I would do is filter into two lists, one for each gender:

males = [d for d in data if d.Gender == 'm']
females = [d for d in data if d.Gender == 'f']

Next, shuffle the orders of the lists, to make it easier to select "randomly" while actually not having to choose random indices:

random.shuffle(males)
random.shuffle(females)

then, choose elements, while trying to stay more-or-less in line with the gender ratio:

# establish number of groups, and size of each group
GROUP_SIZE = 15
GROUP_NUM = math.ceil(len(data) / group_size)
# make an empty list of groups to add each group to
groups = []
while len(groups) < GROUP_NUM and (len(males) > 0 and len(females) > 0):
    # calculate the proper gender ratio, to perfectly balance this group
    num_males = len(males) / len(data) * GROUP_SIZE
    num_females = GROUP_SIZE - num_males
    # select that many people from the previously-shuffled lists
    males_in_this_group = [males.pop(0) for n in range(num_males) if len(males) > 0]
    females_in_this_group = [males.pop(0) for n in range(num_females) if len(females) > 0]
    # put those two subsets together, shuffle to make it feel more random, and add this group
    this_group = males_in_this_group + females_in_this_group
    random.shuffle(this_group)
    groups.append(this_group)

This will ensure that the gender ratio in each group is as true to the original sample as possible. The last group will, of course, be smaller than the others, and will contain "whatever's left" from the other groups.

Upvotes: 1

Related Questions