stats_noob
stats_noob

Reputation: 5897

Preventing a List from "Overflowing"

I thought of the following problem:

Suppose there are 5 balls:

enter image description here

I want to find out the number of ways that these 5 balls can be ordered such that:

In a previous question (Filtering a Data Frame based on Row Conditions), I learned how to first generate a list of all possible ways that these 5 balls can be ordered, and then only keep entries in this list that satisfy the above constraints:

# generate all possible combinations (120 combinations)

library(combinat)
library(dplyr)
library(data.table)
library(tidyverse)

my_list = c("Red", "Blue", "Green", "Yellow", "Orange")

d = permn(my_list)

all_combinations  = as.data.frame(matrix(unlist(d), ncol = 120)) %>%
  setNames(paste0("col", 1:120))

# keep combinations that match constraints

 tpose = transpose(all_combinations)

tpose %>%
  mutate(blue_delete = case_when(V1 == "Blue" & V2 == "Green" ~ TRUE,
                                 V1 == "Blue" & V3 == "Green" ~ TRUE,
                                 V2 == "Blue" & V3 == "Green" ~ TRUE,
                                 V3 == "Blue" & V4 == "Green" ~ TRUE,
                                 V4 == "Blue" & V5 == "Green" ~ TRUE,
                                 TRUE ~ FALSE)) %>%
  filter(V3 != "Red" & V4 != "Red" & V5 != "Red",
         V5 != "Yellow",
         blue_delete == FALSE) %>%
  select(-blue_delete)

# preview answer (28 ways)

       V1     V2     V3     V4     V5
1  Orange    Red   Blue Yellow  Green
2     Red Orange   Blue Yellow  Green
3     Red   Blue Orange Yellow  Green
4     Red   Blue Yellow Orange  Green
5     Red   Blue Yellow  Green Orange
6     Red Yellow   Blue Orange  Green
7  Yellow    Red   Blue Orange  Green

My Question: In the above approach, a list of all possible ways that the balls can be sorted must be first generated - since we are dealing with factorials, this list can "explode" really quickly as the number of balls increase, and this enormous list will become impossible to store in the computer's memory.

Is it possible to restructure the above code in a general way, such that :

This way, even though we might not be able to identify ever possible ordering - we can at least identify some of the potential orderings, compared to the original approach which would result in "overflow" (for cases when there a large number of balls).

Does anyone know if there are any standard ways of handling this kind of problem?

Upvotes: 0

Views: 84

Answers (1)

kybazzi
kybazzi

Reputation: 1030

The solution I originally provided in your linked question will continue to work here - the main modification necessary is the way that the data is created.

First let's make a function that will generate a random set of permutations from a dataset of colours.

library(tidyverse)
generate_permutations = function(num_perm, num_new_colours) {
  if (num_new_colours == 0) {
    new_colours = character(0)
  } else {
    new_colours = paste0("colour", 1:num_new_colours)
  }
  colours = c(
    "Red", "Blue", "Green", "Yellow", "Orange", new_colours
  )
  map(1:num_perm, ~ sample(colours))
}

permutations = generate_permutations(3, 5)

[[1]]
 [1] "Red"     "Yellow"  "Blue"    "colour4" "Green"   "Orange" 
 [7] "colour5" "colour3" "colour2" "colour1"

[[2]]
 [1] "colour3" "Blue"    "Orange"  "colour4" "Green"   "colour2"
 [7] "colour1" "Red"     "Yellow"  "colour5"

[[3]]
 [1] "Yellow"  "colour1" "colour4" "Blue"    "colour3" "colour5"
 [7] "Red"     "colour2" "Green"   "Orange" 

You could modify this function so that, instead of generating dummy colours like colour1, you just use real colours of your choosing.

Next we create a data frame of these permutations. Note that in my function I only keep distinct arrangements of the balls.

generate_data = function(permutations) {
  permutations %>%
    map(~ set_names(.x, paste0("ball", 1:length(.x)))) %>%
    do.call(bind_rows, args = .) %>%
    distinct() %>%
    mutate(id = row_number())
}

data = generate_data(permutations)

# A tibble: 3 x 11
  ball1   ball2   ball3   ball4   ball5   ball6  ball7   ball8   ball9   ball10     id
  <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  <chr>   <chr>   <chr>   <chr>   <int>
1 Red     Blue    colour4 colour5 colour1 Green  colour3 Yellow  colour2 Orange      1
2 colour2 Yellow  Orange  colour3 colour4 Blue   Green   colour5 colour1 Red         2
3 colour1 colour2 Blue    Red     Yellow  Orange colour3 colour4 Green   colour5     3

Before proceeding, we could generalize the rules a little. Let's express the rules using the following parameters:

  • Condition 1: The red ball must be within the first x positions.

  • Condition 2: There must be at least y positions between the green and blue balls.

  • Condition 3: The yellow ball cannot be in the last z positions.

Now we can create a function that filters the data based on these parameters. The function is nearly identical to the solution I provided in the last question.

filter_data = function(data, x = 1, y = 2, z = 1) {
  num_balls = length(data) - 1
  data %>%
    pivot_longer(-id) %>%
    mutate(ball_number = as.numeric(str_extract(name, "[0-9]+"))) %>%
    group_by(id) %>%
    filter(
      # Condition 1
      ball_number[value == "Red"] <= x,
      # Condition 2
      abs(ball_number[value == "Blue"] - ball_number[value == "Green"]) > y,
      # Condition 3
      ball_number[value == "Yellow"] <= num_balls - z
    ) %>%
    select(-ball_number) %>% 
    pivot_wider(values_from = "value", names_from = "name") %>%
    ungroup()
}

As I mentioned in the previous answer, a benefit to this approach is how versatile the condition-making is within this function. You could easily modify these conditions or add new ones without needing to hard-code things such as the exact position of the blue and green balls relative to one another.

Now we are ready to put everything together into a script:

library(tidyverse)
set.seed(123)
valid_perms = generate_permutations(
  num_perm = 10000, 
  num_new_colours = 20
) %>%
  generate_data() %>%
  filter_data(x = 5, y = 3, z = 1)

Even with num_perm = 10000, the code still runs in less than a second, and results in 1528 solutions:

# A tibble: 1,528 x 26
# Groups:   id [1,528]
      id ball1  ball2  ball3 ball4 ball5 ball6 ball7 ball8 ball9 ball10
   <int> <chr>  <chr>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
 1    13 colou~ colou~ Oran~ colo~ Red   colo~ colo~ Blue  colo~ colou~
 2    16 Red    colou~ colo~ Yell~ colo~ colo~ colo~ colo~ colo~ Blue  
 3    27 colou~ colou~ Red   colo~ Oran~ colo~ colo~ colo~ colo~ colou~
 4    33 Red    Blue   colo~ colo~ Oran~ Yell~ Green colo~ colo~ colou~
 5    44 colou~ colou~ colo~ colo~ Red   colo~ colo~ Oran~ colo~ colou~
 6    45 colou~ Orange colo~ Red   colo~ colo~ Green Yell~ colo~ colou~
 7    48 colou~ colou~ Red   colo~ colo~ colo~ Blue  colo~ Oran~ colou~
 8    52 Green  colou~ Red   colo~ colo~ colo~ colo~ colo~ Blue  colou~
 9    58 colou~ colou~ colo~ colo~ Red   colo~ colo~ colo~ Oran~ colou~
10    60 colou~ Red    colo~ Blue  Oran~ colo~ colo~ colo~ colo~ colou~
# ... with 1,518 more rows, and 15 more variables: ball11 <chr>,
#   ball12 <chr>, ball13 <chr>, ball14 <chr>, ball15 <chr>,
#   ball16 <chr>, ball17 <chr>, ball18 <chr>, ball19 <chr>,
#   ball20 <chr>, ball21 <chr>, ball22 <chr>, ball23 <chr>,
#   ball24 <chr>, ball25 <chr>

This solution is practical for more expensive setups. I tried it with 100 balls and 100000 permutations, and obtained 15,647 arrangements within 10 seconds.

You could write a function that will run the code above until you reach a desired number of permutations as well:

get_permutations = function(desired, num_new_colours, x = 1, y = 2, z = 1) {
  valid_perms = tibble()
  num_it = 1
  while (nrow(valid_perms) < desired && num_it <= 100) {
    cur_num = nrow(valid_perms)
    new_perms = generate_permutations(
      num_perm = 1000, 
      num_new_colours = num_new_colours
    ) %>%
      generate_data() %>%
      filter_data(x, y, z) %>%
      select(-id)
    valid_perms = valid_perms %>%
      bind_rows(new_perms) %>%
      distinct()
    
    if (nrow(valid_perms) == cur_num) break
    num_it = num_it + 1
  }
  head(valid_perms, min(desired, nrow(valid_perms)))
}

Note that I cap the number of iterations to 100, and also break the loop if the current random set of 1000 permutations has not added a single new permutation to the current list.

get_permutations(
  desired = 1000,
  num_new_colours = 20,
  # The original conditions:
  x = 2, y = 2, z = 1
)

# A tibble: 1,000 x 25
   ball1  ball2 ball3 ball4 ball5 ball6 ball7 ball8 ball9 ball10 ball11
   <chr>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <chr> 
 1 colou~ Red   colo~ colo~ colo~ colo~ colo~ colo~ colo~ Yellow Green 
 2 colou~ Red   colo~ Blue  colo~ colo~ Green colo~ Oran~ Yellow colou~
 3 Red    colo~ Yell~ colo~ colo~ colo~ colo~ colo~ colo~ colou~ colou~
 4 Red    Green colo~ colo~ colo~ colo~ Blue  colo~ colo~ colou~ Orange
 5 colou~ Red   colo~ colo~ colo~ Oran~ colo~ colo~ colo~ colou~ colou~
 6 Yellow Red   colo~ colo~ colo~ colo~ colo~ colo~ colo~ colou~ Green 
 7 Red    colo~ Green colo~ colo~ colo~ colo~ colo~ colo~ colou~ colou~
 8 colou~ Red   colo~ colo~ colo~ Green colo~ colo~ colo~ colou~ colou~
 9 Red    colo~ colo~ colo~ colo~ Green colo~ colo~ Blue  colou~ colou~
10 colou~ Red   colo~ colo~ colo~ colo~ colo~ colo~ colo~ colou~ colou~
# ... with 990 more rows, and 14 more variables: ball12 <chr>,
#   ball13 <chr>, ball14 <chr>, ball15 <chr>, ball16 <chr>,
#   ball17 <chr>, ball18 <chr>, ball19 <chr>, ball20 <chr>,
#   ball21 <chr>, ball22 <chr>, ball23 <chr>, ball24 <chr>,
#   ball25 <chr>

This ran in about 3 seconds. If you change the x, y, and z parameters to be less restrictive, the code will run even faster as it's easier to find permutations that satisfy the conditions.

Finally, here's what happens if we try to find the exact same solutions as in the original question (with just five balls):

get_permutations(
  desired = 100,
  num_new_colours = 0,
  # Original conditions:
  x = 2, y = 2, z = 1
)
# A tibble: 10 x 5
   ball1 ball2 ball3  ball4  ball5 
   <chr> <chr> <chr>  <chr>  <chr> 
 1 Red   Green Orange Yellow Blue  
 2 Red   Green Yellow Orange Blue  
 3 Blue  Red   Orange Yellow Green 
 4 Red   Blue  Yellow Orange Green 
 5 Green Red   Yellow Orange Blue  
 6 Blue  Red   Yellow Orange Green 
 7 Red   Blue  Orange Yellow Green 
 8 Blue  Red   Yellow Green  Orange
 9 Green Red   Orange Yellow Blue  
10 Green Red   Yellow Blue   Orange

Upvotes: 1

Related Questions