TcM
TcM

Reputation: 63

All possible permutations for large n

I want to find all possible permutations for large n using R. At the moment I am using permutations(n,n) from gtools package, but for n>10 it is almost impossible; I get memory crashes due to the large number of permutations (n!). I do not want to sample as I need to find the exact distribution for a particular statistic. Is there any way I can do this faster or that I can break it down into small blocks?

Upvotes: 1

Views: 920

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226332

Your goal is very likely to be impractical (how large is "large n"??? even if you can generate an enormous number of permutations, how long is it going to take you to summarize over them? How much difference in accuracy is there going to be between an exhaustive computation on a billion elements and a random sample of ten million of them?). However:

The iterpc package can enumerate permutations in blocks. For example:

library("iterpc")

Set up an object ("iterator") to generate permutations of 10 objects:

I <- iterpc(10,labels=1:10,ordered=TRUE)

Return the first 5 permutations:

getnext(I,5)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7    8    9    10
## [2,]    1    2    3    4    5    6    7    8   10     9
## [3,]    1    2    3    4    5    6    7    9    8    10
## [4,]    1    2    3    4    5    6    7    9   10     8
## [5,]    1    2    3    4    5    6    7   10    8     9

Return the next 5 permutations:

getnext(I,5)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7   10    9     8
## [2,]    1    2    3    4    5    6    8    7    9    10
## [3,]    1    2    3    4    5    6    8    7   10     9
## [4,]    1    2    3    4    5    6    8    9    7    10
## [5,]    1    2    3    4    5    6    8    9   10     7

Assuming you can compute your statistic one block at a time and then combine the results, this should be feasible. It doesn't look like you can parallelize very easily, though: there's no way to jump to a particular element of an iterator ... The numperm function from the sna package provides "random" (i.e. non-sequential) access to permutations, although in a different ordering from those given by iterpc - but I'm guessing that iterpc is much more efficient, so you may be better off crunching through blocks sequentially on a single node/core/machine rather than distributing the process.

Here are the first 5 permutations as given by sna::numperm:

library("sna")
t(sapply(1:5,numperm,olength=10))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    2    1    3    4    5    6    7    8    9    10
## [2,]    2    3    1    4    5    6    7    8    9    10
## [3,]    2    3    4    1    5    6    7    8    9    10
## [4,]    2    3    4    5    1    6    7    8    9    10
## [5,]    2    3    4    5    6    1    7    8    9    10

The guts of iterpc are written in C++, so it should be very efficient, but no matter what things are going to get hard for larger values of n. To my surprise, iterpc can handle the full set of 10!=3628800 permutations without much trouble:

system.time(g <- getall(I))
##    user  system elapsed 
##   0.416   0.304   0.719 
dim(g)
## [1] 3628800      10

However, I can't do any computations with n>10 in a single block on my machine (n=11: "cannot allocate vector of size 1.6 Gb" ... n>11 "The length of the iterator is too large, try using getnext(I,d)")

Upvotes: 7

Related Questions