Given set of column values, create data.frame with known number of rows

Question

I'm trying to make datasets of a fixed number of rows to make test datasets - however I'm writing to a destination that requires known keys for each column. For my example, assume that these keys are lowercase letters, upper case letters and numbers respectively.

I need to make a function which, provided only the required number of rows, combines keys such that the number of combinations is equal the required number. Naturally there will be some impossible cases such as prime numbers than the largest key and values larger than the product of the number of keys.

A sample output dataset of 10 rows could look like the following:

data.frame(col1 = rep("a", 10),
           col2 = rep(LETTERS[1:5], 2),
           col3 = rep(1:2, 5))

   col1 col2 col3
1     a    A    1
2     a    B    2
3     a    C    1
4     a    D    2
5     a    E    1
6     a    A    2
7     a    B    1
8     a    C    2
9     a    D    1
10    a    E    2

Note here that I had to manually specify the keys to get the desired number of rows. How can I arrange things so that R can do this for me?

Things I've already considered

optim - The equation I'm trying to solve is effectively x * y * z = n where all of them must be integers. optim doesn't seem to support that constraint
expand.grid and then subset - almost 500 million combinations, eats up all my memory - not an option.
lpSolve - Has the integer option, but only seems to support linear equations. Could use logs to make it linear, but then I can't use the integer option.
factorize from gmp to get factors - Thought about this, but I can't think of a way to distribute the prime factors back into the keys. EDIT: Maybe a bin packing problem?

apitsch · Accepted Answer

For integer optimisation on a low level scale you can use a grid search. Other possibilities are described here.

This should work for your example.

N <- 10
fr <- function(x) { 
  x1 <- x[1]
  x2 <- x[2]
  x3 <- x[3]
  (x1 * x2 * x3 - N)^2
}
library(NMOF)
gridSearch(fr, list(seq(0,5), seq(0,5), seq(0,5)))$minlevels

Given set of column values, create data.frame with known number of rows

Answers (2)

Related Questions