Reputation: 5935
I thought of the following example to illustrate my question.
Suppose there are 5 balls:
Normally, there are 5! = 120 ways these balls can be organized (n!). I can enumerate all these combinations below:
library(combinat)
library(dplyr)
my_list = c("Red", "Blue", "Green", "Yellow", "Orange")
d = permn(my_list)
all_combinations = as.data.frame(matrix(unlist(d), ncol = 120)) %>%
setNames(paste0("col", 1:120))
all_combinations[,1:5]
col1 col2 col3 col4 col5
1 Red Red Red Red Orange
2 Blue Blue Blue Orange Red
3 Green Green Orange Blue Blue
4 Yellow Orange Green Green Green
5 Orange Yellow Yellow Yellow Yellow
My Question:
Suppose I wanted to filter this list by the following conditions:
I then tried to filter the above data based on these 3 conditons:
# attempt to write first condition
cond_1 <- all_combinations[which(all_combinations[1,]== "Red" || all_combinations[2,] == "Red"), ]
#not sure how to write the second condition
# attempt to write the third condition
cond_3 <- data_frame_version[which(data_frame_version[5,] !== "Yellow" ), ]
# if everything worked, an "anti join" style statement could be written to remove "cond_1, cond_2, cond_3" from the original data?
But this is not working at all - the first and third condition return a data frame only containing 4 rows for all the columns.
Can someone please show me how to correctly filter "all_combinations" using the above 3 filters?
Note:
The following code can transpose the original data:
library(data.table)
tpose = transpose(all_combinations)
df = tpose
#group every 5 rows by the same id to identify unique combinations
bloc_len <- 5
df$bloc <-
rep(seq(1, 1 + nrow(df) %/% bloc_len), each = bloc_len, length.out = nrow(df))
head(df)
V1 V2 V3 V4 V5 bloc
1 Red Blue Green Yellow Orange 1
2 Red Blue Green Orange Yellow 1
3 Red Blue Orange Green Yellow 1
4 Red Orange Blue Green Yellow 1
5 Orange Red Blue Green Yellow 1
6 Orange Red Blue Yellow Green 2
Upvotes: 2
Views: 773
Reputation: 6784
I think you may find it easier to apply the three conditions if you consider the position of each ball rather than the ball in each position, as in
library(combinat)
my_list = c("Red", "Blue", "Green", "Yellow", "Orange")
d = permn(1:5)
md = matrix(unlist(d), ncol=5, byrow=TRUE)
colnames(md) = my_list
ok = md[, "Red"] <= 2 &
abs(md[, "Blue"] - md[, "Green"]) > 2 &
md[, "Yellow"] != 5
sum(ok)
# 10
md[ok, ]
# Red Blue Green Yellow Orange
# [1,] 1 2 5 3 4
# [2,] 1 5 2 3 4
# [3,] 1 5 2 4 3
# [4,] 1 2 5 4 3
# [5,] 2 4 1 3 5
# [6,] 2 1 4 3 5
# [7,] 2 1 5 4 3
# [8,] 2 5 1 4 3
# [9,] 2 5 1 3 4
#[10,] 2 1 5 3 4
Upvotes: 1
Reputation: 1374
Maybe I misread the question but as I see it, none of the answers seem to show a solution where there are 2 columns between the colors in step 2 of the question.
I took the liberty of testing the data and found that only if you use "Yellow" and "Orange" you can find a filter condition doing what you want (as far as I understand it).
This is not a general answer and it is not actually correct because "Yellow" is on the last row, violating the rule but:
To have distance of 2 between your colors while the last row is already accounted for reduces the problem to a 4 column problem. So a distance of 2 can only be achieved between column 1 and column 4. This leads to 4 assumptions:
Column 1 would need to be either "Green" or "Blue"
Column 2 would need to be "Red"
Column 3 should be neither "Green" nor "Blue"
Column 4 should be again either "Green" or "Blue" but not Column 1
This is the code I came up with, not beautiful and as explained "Green" and "Blue" switched to "Yellow" and "Orange" but else I think this works.
library(combinat)
library(tidyverse)
my_list = c("Red", "Blue", "Green", "Yellow", "Orange")
d = permn(my_list)
all_combinations = as.data.frame(matrix(unlist(d), ncol = 5)) %>%
setNames(paste0("col", 1:5))
`%!in%` <- Negate(`%in%`)
combis <- all_combinations %>%
filter(col1 %in% c("Yellow", "Orange"),
col2 == "Red",
!col3 %in% c("Yellow", "Orange"),
col5 == "Yellow")
results <- vector()
for(i in seq_along(combis[,1])){
if(combis[i,][1] %!in% c(combis[i,][4], "Red", "Green", "Blue")){
results <- combis[i,]
}
}
results
col1 col2 col3 col4 col5
3 Yellow Red Green Orange Yellow
Upvotes: 1
Reputation: 1030
Here's a scalable tidyverse solution.
First, let's make the data a tibble of 120 rows, one for each combination of balls.
library(tidyverse)
library(combinat)
data = my_list %>%
permn() %>%
map(~ set_names(.x, paste0("ball", 1:5))) %>%
do.call(bind_rows, args = .) %>%
mutate(id = row_number())
Our data:
# A tibble: 120 x 6
ball1 ball2 ball3 ball4 ball5 id
<chr> <chr> <chr> <chr> <chr> <int>
1 Red Blue Green Yellow Orange 1
2 Red Blue Green Orange Yellow 2
3 Red Blue Orange Green Yellow 3
4 Red Orange Blue Green Yellow 4
5 Orange Red Blue Green Yellow 5
6 Orange Red Blue Yellow Green 6
7 Red Orange Blue Yellow Green 7
8 Red Blue Orange Yellow Green 8
9 Red Blue Yellow Orange Green 9
10 Red Blue Yellow Green Orange 10
# ... with 110 more rows
The key idea of this solution is to transform the data into long format. This will make it trivial to check for each condition. Afterwards we can bring it back to wide format.
data %>%
pivot_longer(-id) %>%
mutate(ball_number = as.numeric(str_extract(name, "[1-5]"))) %>%
group_by(id) %>%
filter(
# Condition 1
ball_number[value == "Red"] %in% c(1, 2),
# Condition 2
abs(ball_number[value == "Blue"] - ball_number[value == "Green"]) >= 3,
# Condition 3
ball_number[value == "Yellow"] != 5
) %>%
select(-ball_number) %>%
pivot_wider(values_from = "value", names_from = "name")
The output shows that there are 10 permutations:
# A tibble: 10 x 6
# Groups: id [10]
id ball1 ball2 ball3 ball4 ball5
<int> <chr> <chr> <chr> <chr> <chr>
1 8 Red Blue Orange Yellow Green
2 9 Red Blue Yellow Orange Green
3 32 Red Green Yellow Orange Blue
4 33 Red Green Orange Yellow Blue
5 48 Green Red Orange Yellow Blue
6 49 Green Red Yellow Orange Blue
7 50 Green Red Yellow Blue Orange
8 111 Blue Red Yellow Green Orange
9 112 Blue Red Yellow Orange Green
10 113 Blue Red Orange Yellow Green
An improvement this solution offers is that all the conditions you want to check for are very straightforward because of our variable ball_number
. You could easily scale this solution to more complex conditions if there were more balls, such as the first 5 balls being red, or the blue ball plus green ball being equal to 7.
Upvotes: 1
Reputation: 285
If you are not so concerned about the data.frame
structure, my preferred approach would be to retain each outcome as the member of a list (i.e. your d
variable) and sapply()
that with a function which checks that all of the conditions are satisfied for that outcome.
Observe:
library(combinat)
my_list <- c("Red", "Blue", "Green", "Yellow", "Orange")
my_list_perm <- combinat::permn(my_list)
# This function examines one particular outcome of the trial, e.g. outcome = ["Blue", "Orange", "Red", "Green", "Yellow"]
test_conditions <- function(outcome) {
# Condition 1
condition_1 <- "Red" %in% outcome[c(1,2)]
# Condition 2
condition_2 <- base::abs(base::which(outcome == "Blue") - base::which(outcome == "Green")) >= 2
# Condition 3
condition_3 <- base::which(outcome == "Yellow") != base::length(outcome)
all <- condition_1 && condition_2 && condition_3
return(all)
}
my_list_matches <- base::which(base::sapply(my_list_perm, test_conditions)) # applies the function to each list element (which itself is an outcome)
print(my_list_matches) # displays which trials / outcomes satisfied all conditions
#> [1] 6 7 8 9 10 12 19 22 29 31 32 33 34 35 41 48 49 50 111 112 113 120
Created on 2022-01-04 by the reprex package (v1.0.0)
You can then use the matching indices to filter on the original list.
Upvotes: 1
Reputation: 1391
Here's what you can do. I know it's not the prettiest and most optimized solution you'll find out there. But it works!
all_combinations = as.data.frame(matrix(unlist(d), ncol = 5)) %>%
setNames(paste0("col", 1:5))
cond_1 <- all_combinations %>%
filter(col1 == "Red" | col2 == "Red")
cond_2 <- cond_1 %>%
filter(col1 == "Blue" | col1 == "Green" |
col2 == "Blue" | col2 == "Green" |
col3 == "Blue" | col3 == "Green" |
col4 == "Blue" | col4 == "Green" |
col5 == "Blue" | col5 == "Green")
cond_2 <- cond_2 %>%
mutate(cond = ifelse(col1 == 'Blue' & col4 == 'Green', 2, NA) |
ifelse(col1 == 'Blue' & col5 == 'Green', 3, NA) |
ifelse(col2 == 'Blue' & col5 == 'Green', 2, NA) |
ifelse(col1 == 'Green' & col4 == 'Blue', 2, NA) |
ifelse(col2 == 'Green' & col5 == 'Blue', 3, NA)) %>%
filter(cond == T)
cond_3 <- cond_2%>%
filter(col5 != "Yellow")
Output:
col1 col2 col3 col4 col5 cond
1 Blue Red Orange Green Red TRUE
Upvotes: 1
Reputation: 11016
You can do:
library(tidyverse)
tpose %>%
mutate(blue_delete = case_when(V1 == "Blue" & V2 == "Green" ~ TRUE,
V1 == "Blue" & V3 == "Green" ~ TRUE,
V2 == "Blue" & V3 == "Green" ~ TRUE,
V3 == "Blue" & V4 == "Green" ~ TRUE,
V4 == "Blue" & V5 == "Green" ~ TRUE,
TRUE ~ FALSE)) %>%
filter(V3 != "Red" & V4 != "Red" & V5 != "Red",
V5 != "Yellow",
blue_delete == FALSE) %>%
select(-blue_delete)
Upvotes: 1