Reputation: 189
The problem
I am looking for a way to create a list of all possible combinations of a set of parameter with their values, where each parameter exists exactly once in the output. An example input set would look like this:
sampleData = data.frame(Parameter= c("A","B","B","C","C","C","D","D"),
Value = c(1,0.9,1,0.8,1,1.2,0.8,1.1))
Parameter Value
1 A 1.0
2 B 0.9
3 B 1.0
4 C 0.8
5 C 1.0
6 C 1.2
7 D 0.8
8 D 1.1
The desired output is a list of all unique ABCD combinations so the first two elements of the list are e.g.
[[1]]
Parameter Value
1 A 1.0
2 B 0.9
3 C 0.8
4 D 0.8
[[2]]
Parameter Value
1 A 1.0
2 B 1.0
3 C 0.8
4 D 0.8
My attempt so far
I have looked into the combinations
function in the gtools
package and the following does something close to what I want
combinations(n = nrow(sampleData),
r = length(unique(sampleData$Parameter)),
v = paste0(sampleData$Parameter,"_",sampleData$Value))
with some postprocessing I will be able to get the desired result out.
But combinations
also yields results like
Parameter Value
1 A 1.0
2 B 0.9
3 B 1.0
4 D 0.8
i.e. with one (or more) parameter occuring multiple times.
I will be able to post-process this out, but the output of combinations quickly grows large (already 70 for this example, growing as n
and r!
) while the desired output list grows much less rapidly (12 in this example, adn growing much slower).
So my question is: Is there a (relatively) efficient way to generate the desired output of ABCD value combinations without first generating a much larger set and then stripping invalid combinations out?
Upvotes: 1
Views: 766
Reputation: 29125
I think you're looking for expand.grid()
.
The following will return a data frame with each unique combination in a row:
library(dplyr)
df <- sampleData %>%
split(.$Parameter) %>% # create a dataframe of values for each parameter
lapply(function(df){df$Value}) %>% # extract the values for each parameter as an array
expand.grid() # generate all combinations
> df
A B C D
1 1 0.9 0.8 0.8
2 1 1.0 0.8 0.8
3 1 0.9 1.0 0.8
4 1 1.0 1.0 0.8
5 1 0.9 1.2 0.8
6 1 1.0 1.2 0.8
7 1 0.9 0.8 1.1
8 1 1.0 0.8 1.1
9 1 0.9 1.0 1.1
10 1 1.0 1.0 1.1
11 1 0.9 1.2 1.1
12 1 1.0 1.2 1.1
And if you want the results converted into a list of data frames:
library(tidyr)
df %>%
mutate(combination = row_number()) %>%
gather(Parameter, Value, -combination) %>%
split(.$combination) %>%
lapply(function(d){d[,-1]})
$`1`
Parameter Value
1 A 1.0
13 B 0.9
25 C 0.8
37 D 0.8
$`2`
Parameter Value
2 A 1.0
14 B 1.0
26 C 0.8
38 D 0.8
...
Upvotes: 4