Reputation: 5897
I thought of the following problem:
Suppose there are 5 balls:
I want to find out the number of ways that these 5 balls can be ordered such that:
In a previous question (Filtering a Data Frame based on Row Conditions), I learned how to first generate a list of all possible ways that these 5 balls can be ordered, and then only keep entries in this list that satisfy the above constraints:
# generate all possible combinations (120 combinations)
library(combinat)
library(dplyr)
library(data.table)
library(tidyverse)
my_list = c("Red", "Blue", "Green", "Yellow", "Orange")
d = permn(my_list)
all_combinations = as.data.frame(matrix(unlist(d), ncol = 120)) %>%
setNames(paste0("col", 1:120))
# keep combinations that match constraints
tpose = transpose(all_combinations)
tpose %>%
mutate(blue_delete = case_when(V1 == "Blue" & V2 == "Green" ~ TRUE,
V1 == "Blue" & V3 == "Green" ~ TRUE,
V2 == "Blue" & V3 == "Green" ~ TRUE,
V3 == "Blue" & V4 == "Green" ~ TRUE,
V4 == "Blue" & V5 == "Green" ~ TRUE,
TRUE ~ FALSE)) %>%
filter(V3 != "Red" & V4 != "Red" & V5 != "Red",
V5 != "Yellow",
blue_delete == FALSE) %>%
select(-blue_delete)
# preview answer (28 ways)
V1 V2 V3 V4 V5
1 Orange Red Blue Yellow Green
2 Red Orange Blue Yellow Green
3 Red Blue Orange Yellow Green
4 Red Blue Yellow Orange Green
5 Red Blue Yellow Green Orange
6 Red Yellow Blue Orange Green
7 Yellow Red Blue Orange Green
My Question: In the above approach, a list of all possible ways that the balls can be sorted must be first generated - since we are dealing with factorials, this list can "explode" really quickly as the number of balls increase, and this enormous list will become impossible to store in the computer's memory.
Is it possible to restructure the above code in a general way, such that :
Step 1: A random ordering of balls is generated
Step 2: If the ordering from Step 1 satisfies the conditions, it is kept in a separate list - if it does not satisfy the conditions, it is discarded.
Step 3: A new random ordering is generated
Step 4: Continue until the list from Step 2 contains a fixed number of entries (e.g. 1000)
This way, even though we might not be able to identify ever possible ordering - we can at least identify some of the potential orderings, compared to the original approach which would result in "overflow" (for cases when there a large number of balls).
Does anyone know if there are any standard ways of handling this kind of problem?
Upvotes: 0
Views: 84
Reputation: 1030
The solution I originally provided in your linked question will continue to work here - the main modification necessary is the way that the data is created.
First let's make a function that will generate a random set of permutations from a dataset of colours.
library(tidyverse)
generate_permutations = function(num_perm, num_new_colours) {
if (num_new_colours == 0) {
new_colours = character(0)
} else {
new_colours = paste0("colour", 1:num_new_colours)
}
colours = c(
"Red", "Blue", "Green", "Yellow", "Orange", new_colours
)
map(1:num_perm, ~ sample(colours))
}
permutations = generate_permutations(3, 5)
[[1]]
[1] "Red" "Yellow" "Blue" "colour4" "Green" "Orange"
[7] "colour5" "colour3" "colour2" "colour1"
[[2]]
[1] "colour3" "Blue" "Orange" "colour4" "Green" "colour2"
[7] "colour1" "Red" "Yellow" "colour5"
[[3]]
[1] "Yellow" "colour1" "colour4" "Blue" "colour3" "colour5"
[7] "Red" "colour2" "Green" "Orange"
You could modify this function so that, instead of generating dummy colours like colour1
, you just use real colours of your choosing.
Next we create a data frame of these permutations. Note that in my function I only keep distinct arrangements of the balls.
generate_data = function(permutations) {
permutations %>%
map(~ set_names(.x, paste0("ball", 1:length(.x)))) %>%
do.call(bind_rows, args = .) %>%
distinct() %>%
mutate(id = row_number())
}
data = generate_data(permutations)
# A tibble: 3 x 11
ball1 ball2 ball3 ball4 ball5 ball6 ball7 ball8 ball9 ball10 id
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 Red Blue colour4 colour5 colour1 Green colour3 Yellow colour2 Orange 1
2 colour2 Yellow Orange colour3 colour4 Blue Green colour5 colour1 Red 2
3 colour1 colour2 Blue Red Yellow Orange colour3 colour4 Green colour5 3
Before proceeding, we could generalize the rules a little. Let's express the rules using the following parameters:
Condition 1: The red ball must be within the first x
positions.
Condition 2: There must be at least y
positions between the green and blue balls.
Condition 3: The yellow ball cannot be in the last z
positions.
Now we can create a function that filters the data based on these parameters. The function is nearly identical to the solution I provided in the last question.
filter_data = function(data, x = 1, y = 2, z = 1) {
num_balls = length(data) - 1
data %>%
pivot_longer(-id) %>%
mutate(ball_number = as.numeric(str_extract(name, "[0-9]+"))) %>%
group_by(id) %>%
filter(
# Condition 1
ball_number[value == "Red"] <= x,
# Condition 2
abs(ball_number[value == "Blue"] - ball_number[value == "Green"]) > y,
# Condition 3
ball_number[value == "Yellow"] <= num_balls - z
) %>%
select(-ball_number) %>%
pivot_wider(values_from = "value", names_from = "name") %>%
ungroup()
}
As I mentioned in the previous answer, a benefit to this approach is how versatile the condition-making is within this function. You could easily modify these conditions or add new ones without needing to hard-code things such as the exact position of the blue and green balls relative to one another.
Now we are ready to put everything together into a script:
library(tidyverse)
set.seed(123)
valid_perms = generate_permutations(
num_perm = 10000,
num_new_colours = 20
) %>%
generate_data() %>%
filter_data(x = 5, y = 3, z = 1)
Even with num_perm = 10000
, the code still runs in less than a second, and results in 1528 solutions:
# A tibble: 1,528 x 26
# Groups: id [1,528]
id ball1 ball2 ball3 ball4 ball5 ball6 ball7 ball8 ball9 ball10
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 13 colou~ colou~ Oran~ colo~ Red colo~ colo~ Blue colo~ colou~
2 16 Red colou~ colo~ Yell~ colo~ colo~ colo~ colo~ colo~ Blue
3 27 colou~ colou~ Red colo~ Oran~ colo~ colo~ colo~ colo~ colou~
4 33 Red Blue colo~ colo~ Oran~ Yell~ Green colo~ colo~ colou~
5 44 colou~ colou~ colo~ colo~ Red colo~ colo~ Oran~ colo~ colou~
6 45 colou~ Orange colo~ Red colo~ colo~ Green Yell~ colo~ colou~
7 48 colou~ colou~ Red colo~ colo~ colo~ Blue colo~ Oran~ colou~
8 52 Green colou~ Red colo~ colo~ colo~ colo~ colo~ Blue colou~
9 58 colou~ colou~ colo~ colo~ Red colo~ colo~ colo~ Oran~ colou~
10 60 colou~ Red colo~ Blue Oran~ colo~ colo~ colo~ colo~ colou~
# ... with 1,518 more rows, and 15 more variables: ball11 <chr>,
# ball12 <chr>, ball13 <chr>, ball14 <chr>, ball15 <chr>,
# ball16 <chr>, ball17 <chr>, ball18 <chr>, ball19 <chr>,
# ball20 <chr>, ball21 <chr>, ball22 <chr>, ball23 <chr>,
# ball24 <chr>, ball25 <chr>
This solution is practical for more expensive setups. I tried it with 100 balls and 100000 permutations, and obtained 15,647 arrangements within 10 seconds.
You could write a function that will run the code above until you reach a desired number of permutations as well:
get_permutations = function(desired, num_new_colours, x = 1, y = 2, z = 1) {
valid_perms = tibble()
num_it = 1
while (nrow(valid_perms) < desired && num_it <= 100) {
cur_num = nrow(valid_perms)
new_perms = generate_permutations(
num_perm = 1000,
num_new_colours = num_new_colours
) %>%
generate_data() %>%
filter_data(x, y, z) %>%
select(-id)
valid_perms = valid_perms %>%
bind_rows(new_perms) %>%
distinct()
if (nrow(valid_perms) == cur_num) break
num_it = num_it + 1
}
head(valid_perms, min(desired, nrow(valid_perms)))
}
Note that I cap the number of iterations to 100, and also break the loop if the current random set of 1000 permutations has not added a single new permutation to the current list.
get_permutations(
desired = 1000,
num_new_colours = 20,
# The original conditions:
x = 2, y = 2, z = 1
)
# A tibble: 1,000 x 25
ball1 ball2 ball3 ball4 ball5 ball6 ball7 ball8 ball9 ball10 ball11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 colou~ Red colo~ colo~ colo~ colo~ colo~ colo~ colo~ Yellow Green
2 colou~ Red colo~ Blue colo~ colo~ Green colo~ Oran~ Yellow colou~
3 Red colo~ Yell~ colo~ colo~ colo~ colo~ colo~ colo~ colou~ colou~
4 Red Green colo~ colo~ colo~ colo~ Blue colo~ colo~ colou~ Orange
5 colou~ Red colo~ colo~ colo~ Oran~ colo~ colo~ colo~ colou~ colou~
6 Yellow Red colo~ colo~ colo~ colo~ colo~ colo~ colo~ colou~ Green
7 Red colo~ Green colo~ colo~ colo~ colo~ colo~ colo~ colou~ colou~
8 colou~ Red colo~ colo~ colo~ Green colo~ colo~ colo~ colou~ colou~
9 Red colo~ colo~ colo~ colo~ Green colo~ colo~ Blue colou~ colou~
10 colou~ Red colo~ colo~ colo~ colo~ colo~ colo~ colo~ colou~ colou~
# ... with 990 more rows, and 14 more variables: ball12 <chr>,
# ball13 <chr>, ball14 <chr>, ball15 <chr>, ball16 <chr>,
# ball17 <chr>, ball18 <chr>, ball19 <chr>, ball20 <chr>,
# ball21 <chr>, ball22 <chr>, ball23 <chr>, ball24 <chr>,
# ball25 <chr>
This ran in about 3 seconds. If you change the x
, y
, and z
parameters to be less restrictive, the code will run even faster as it's easier to find permutations that satisfy the conditions.
Finally, here's what happens if we try to find the exact same solutions as in the original question (with just five balls):
get_permutations(
desired = 100,
num_new_colours = 0,
# Original conditions:
x = 2, y = 2, z = 1
)
# A tibble: 10 x 5
ball1 ball2 ball3 ball4 ball5
<chr> <chr> <chr> <chr> <chr>
1 Red Green Orange Yellow Blue
2 Red Green Yellow Orange Blue
3 Blue Red Orange Yellow Green
4 Red Blue Yellow Orange Green
5 Green Red Yellow Orange Blue
6 Blue Red Yellow Orange Green
7 Red Blue Orange Yellow Green
8 Blue Red Yellow Green Orange
9 Green Red Orange Yellow Blue
10 Green Red Yellow Blue Orange
Upvotes: 1