Reputation: 15395
I'm trying to make datasets of a fixed number of rows to make test datasets - however I'm writing to a destination that requires known keys for each column. For my example, assume that these keys are lowercase letters, upper case letters and numbers respectively.
I need to make a function which, provided only the required number of rows, combines keys such that the number of combinations is equal the required number. Naturally there will be some impossible cases such as prime numbers than the largest key and values larger than the product of the number of keys.
A sample output dataset of 10 rows could look like the following:
data.frame(col1 = rep("a", 10),
col2 = rep(LETTERS[1:5], 2),
col3 = rep(1:2, 5))
col1 col2 col3
1 a A 1
2 a B 2
3 a C 1
4 a D 2
5 a E 1
6 a A 2
7 a B 1
8 a C 2
9 a D 1
10 a E 2
Note here that I had to manually specify the keys to get the desired number of rows. How can I arrange things so that R can do this for me?
Things I've already considered
optim
- The equation I'm trying to solve is effectively x * y * z = n
where all of them must be integers. optim
doesn't seem to support that constraintexpand.grid
and then subset - almost 500 million combinations, eats up all my memory - not an option.lpSolve
- Has the integer option, but only seems to support linear equations. Could use logs to make it linear, but then I can't use the integer option.factorize
from gmp
to get factors - Thought about this, but I can't think of a way to distribute the prime factors back into the keys. EDIT: Maybe a bin packing problem?Upvotes: 0
Views: 115
Reputation: 1702
For integer optimisation on a low level scale you can use a grid search. Other possibilities are described here.
This should work for your example.
N <- 10
fr <- function(x) {
x1 <- x[1]
x2 <- x[2]
x3 <- x[3]
(x1 * x2 * x3 - N)^2
}
library(NMOF)
gridSearch(fr, list(seq(0,5), seq(0,5), seq(0,5)))$minlevels
Upvotes: 1
Reputation: 1795
I am a bit reluctant,but we can work things out:
a1<-2
a2<-5
eval(parse(text=paste0("data.frame(col1 = rep(LETTERS[1],",a1*a2,"),col2 =
rep(LETTERS[1:",a2,"],",a1,"),col3 = rep(1:",a1,",",a2,"))")))
col1 col2 col3
1 A A 1
2 A B 2
3 A C 1
4 A D 2
5 A E 1
6 A A 2
7 A B 1
8 A C 2
9 A D 1
10 A E 2
Is this something similar to what you are asking?
Upvotes: 0