R. Simian
R. Simian

Reputation: 187

R function that evenly splits observations into groups

I have a 30 x 2 data frame (df) with one column containing the names of 30 individuals and the second column containing their ID#. I want to create a function in R that randomly and most evenly splits the 30 individuals into groups and can handle division with and without remainders.

To clarify, this function would:

• Take 2 parameters as arguments: the df and an integer representing the number of groups • Give me back the original df but with an additional column having the group number that each person gets assigned to randomly • If the number of people (rows) cannot be divided by the integer given, the remaining rows should be split as evenly as possible between the groups

For example: • If I want the 30 people split into 1 group, my function should return df with a new column "group_no" that has 1 for every person (each person would be assigned to the same group)

• If I want 4 groups, I want to see 10 people assigned to 2 groups and the remaining 5 people assigned to another 2 groups.

• If I want 8 groups, then the function should give me 6 groups of 4 people and 2 groups of 3 and so on.

I've written some code that kind of does what I need but I'm just manually entering the groups so not just how random or correct it is... I want to instead write all this in a function that can automatically perform these tasks:

#My code so far
#For 1 group of 30 people

people=1:30
groups=1
df$group_no <- print(sample(groups))

#For 4 groups (2 groups of 10 people and 2 groups of 5 people)
groups=c(rep(1,5), rep(2,5), rep(3,10), rep(4,10))
df$group_no <- print(sample(groups))

#For 7 groups (3 groups of 6 people and 4 groups of 3 people)
groups=c(rep(1,6), rep(2,6), rep(3,6), rep(4,3), rep(5,3), rep(6,3), rep(7,3))
df$group_no <- print(sample(groups))

#For 8 groups (6 groups of 4 people and 2 groups of 3 people)
groups=c(rep(1,4), rep(2,4), rep(3,4), rep(4,4), rep(5,4), rep(6,4), rep(7,3), rep(8,3))
df$group_no <- print(sample(groups))


#For 10 groups of 3 people each
groups=c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3), rep(6,3), rep(7,3), rep(8,3), rep(9,3), rep(10,3))
df$group_no <- print(sample(groups))


fct_grouping <- function(df, nr_groups) {
 ????? 
}

Upvotes: 3

Views: 1622

Answers (3)

Lief Esbenshade
Lief Esbenshade

Reputation: 833

This function makes the group sizes as close to even as possible and randomizes group assignment.


grouper <- function(df, n) {

  # create a random number for each row
  random <- sample(1:nrow(df), replace = FALSE, nrow(df))

  # divide the random number by the group size
  df$group_number <- ceiling(random / (nrow(df) / n))

  return(df)  
}

Upvotes: 2

David Jorquera
David Jorquera

Reputation: 2102

I'm sure that what you are looking for should be mathematically possible to program in R, but it's difficult to model for the case when the remainder of the number of groups with the number of people is not equal to zero because there are more than 1 option to assign cases (think defining for number of groups of 10 and greater). Also, the examples you make don't meet the condition you require (size of groups most similarly possible). This is the closest thing I can think of:

df <- data.frame(people = c(1:30))

fct_grouping <- function(df, nr_groups) {

if (nrow(df) %% nr_groups == 0) {
print(cbind(df, sample(nr_groups)))

} else {
print("n is not a multiple of number of people")
}}

df2 <- fct_grouping(df, 5)

#         people sample(nr_groups)
# 1       1                 1
# 2       2                 3
# 3       3                 2
# 4       4                 5
# 5       5                 4
# 6       6                 1
# 7       7                 3
# 8       8                 2
# 9       9                 5
# 10     10                 4
# 11     11                 1
# 12     12                 3
# 13     13                 2
# 14     14                 5
# 15     15                 4
# 16     16                 1
# 17     17                 3
# 18     18                 2
# 19     19                 5
# 20     20                 4
# 21     21                 1
# 22     22                 3
# 23     23                 2
# 24     24                 5
# 25     25                 4
# 26     26                 1
# 27     27                 3
# 28     28                 2
# 29     29                 5
# 30     30                 4

Upvotes: 1

apeqqut
apeqqut

Reputation: 11

The following code should do just what you asked and returns a vector with the groupings.

fct_grouping <- function(df, nr_groups) {
    base_number <- floor(nrow(df) / nr_groups)
    rest <- nrow(df) - base_number * nr_groups
    groupings <- sort(c(rep(seq(nr_groups), base_number), if (rest==0) numeric() else seq(rest)))
    return(groupings)
}

Upvotes: 1

Related Questions