jake2389
jake2389

Reputation: 1174

Split into groups based on (multiple) conditions?

I have set of marbles, of different colors and weights, and I want to split them into groups based on their weight and color.

The conditions are:

  1. A group cannot weigh more than 100 units
  2. A group cannot have more than 5 different-colored marbles.

A reproducible example:

marbles <- data.frame(color=sample(1:20, 20), weight=sample(1:40, 20, replace=T))

   color weight
1      1     22
2     15     33
3     13     35
4     11     13
5      6     26
6      8     15
7     10      3
8     16     22
9     14     21
10     3     16
11     4     26
12    20     30
13     9     31
14     2     16
15     7     12
16    17     13
17    19     19
18     5     17
19    12     12
20    18     40

And what I want is this group column:

   color weight group
1      1     22     1
2     15     33     1
3     13     35     1
4     11     13     2
5      6     26     2
6      8     15     2
7     10      3     2
8     16     22     2
9     14     21     3
10     3     16     3
11     4     26     3
12    20     30     3
13     9     31     4
14     2     16     4
15     7     12     4
16    17     13     4
17    19     19     4
18     5     17     5
19    12     12     5
20    18     40     5

TIA.

Upvotes: 2

Views: 1462

Answers (2)

Onyambu
Onyambu

Reputation: 79338

in base R you could write a recursive function as shown below:

create_group = function(df,a){
  if(missing(a)) a = cumsum(df$weight)%/%100
  b = !ave(df$color,a,FUN=seq_along)%%6
  d = ave(df$weight,a+b,FUN=cumsum)>100
  a = a+b+d
  if (any(b|d)) create_group(df,a) else cbind(df,group = a+1)
}

create_group(df)
   color weight group
1      1     22     1
2     15     33     1
3     13     35     1
4     11     13     2
5      6     26     2
6      8     15     2
7     10      3     2
8     16     22     2
9     14     21     3
10     3     16     3
11     4     26     3
12    20     30     3
13     9     31     4
14     2     16     4
15     7     12     4
16    17     13     4
17    19     19     4
18     5     17     5
19    12     12     5
20    18     40     5

Upvotes: 0

RK1
RK1

Reputation: 2542

The below isn't an optimal assignment to the groups, it just does it sequentially through the data frame. It's uses rowwise and might not be the most efficient way as it's not a vectorized approach.

library(dplyr)

marbles <- data.frame(color=sample(1:20, 20), weight=sample(1:40, 20, replace=T))

Below I create a rowwise function which we can apply using dplyr

assign_group <- function(color, weight) {
  # Conditions
  clists = append(color_list, color)
  sum_val = group_sum + weight
  num_colors = length(unique(color_list))

  assign_condition = (sum_val <= 100 & num_colors <= 5)
  #assign globals
  cval <- if(assign_condition) clists else c(color)
  sval <- ifelse(assign_condition, sum_val, weight)
  gval <- ifelse(assign_condition, group_number, group_number + 1)
  assign("color_list", cval, envir = .GlobalEnv)
  assign("group_sum", sval, envir = .GlobalEnv)
  assign("group_number", gval, envir = .GlobalEnv)
  res = group_number
  return(res)
}

I then setup a few global variables to track the allocation of the marbles to each group.

# globals
color_list <<- c()
group_sum <<- 0
group_number <<- 1

Finally run this function using mutate

test <- marbles %>% rowwise() %>% mutate(group = assign_group(color,weight)) %>% data.frame() 

Which results in the below

 color weight group
1      6     27     1
2     12     16     1
3     15     32     1
4     20     25     1
5     19      5     2
6      2     21     2
7     16     39     2
8     17      4     2
9     11     16     2
10     7      7     3
11    10      5     3
12     1     30     3
13    13      7     3
14     9     39     3
15    14      7     4
16     8     17     4
17    18      9     4
18     4     36     4
19     3      1     4
20     5      3     5

And seems to meet the constraints

test %>% group_by(group) %>% summarise(tot_w = sum(weight), n_c = length(unique(color)) )

  group tot_w   n_c
  <dbl> <int> <int>
1     1   100     4
2     2    85     5
3     3    88     5
4     4    70     5
5     5     3     1

Upvotes: 1

Related Questions