Joseph Wood
Joseph Wood

Reputation: 7608

R Algorithm for generating all possible factorizations of a number

For example, consider the number 96. It can be written in following ways:

1. 96
2. 48 * 2
3. 24 * 2 * 2
4. 12 * 2 * 2 * 2
5. 6 * 2 * 2 * 2 * 2
6. 3 * 2 * 2 * 2 * 2 * 2
7. 4 * 3 * 2 * 2 * 2
8. 8 * 3 * 2 * 2
9. 6 * 4 * 2 * 2
10. 16 * 3 * 2
11. 4 * 4 * 3 * 2
12. 12 * 4 * 2
13. 8 * 6 * 2
14. 32 * 3
15. 8 * 4 * 3
16. 24 * 4
17. 6 * 4 * 4
18. 16 * 6
19. 12 * 8

I know this is related to partitions as any number written as the power, n, of a single prime, p, is simply the number of ways you can write n. For example, to find all of the factorizations of 2^5, we must find all ways to write 5. They are:

  1. 1+1+1+1+1 ==>> 2^1 * 2^1 * 2^1 * 2^1 * 2^1
  2. 1+1+1+2 ==>> 2^1 * 2^1 * 2^1 * 2^2
  3. 1+1+3 ==>> 2^1 * 2^1 * 2^3
  4. 1+2+2 ==>> 2^1 * 2^2 * 2^2
  5. 1+4 ==>> 2^1 * 2^4
  6. 2+3 ==>> 2^2 * 2^3
  7. 5 ==>> 2^5

I found a wonderful article by Jerome Kelleher about partition generating algorithms here. I have adapted one of his python algorithms to R. The code is below:

library(partitions)   ## using P(n) to determine number of partitions of an integer
IntegerPartitions <- function(n) {
    a <- 0L:n
    k <- 2L
    a[2L] <- n
    MyParts <- vector("list", length=P(n))
    count <- 0L
    while (!(k==1L)) {
        x <- a[k-1L]+1L
        y <- a[k]-1L
        k <- k-1L
        while (x<=y) {a[k] <- x; y <- y-x; k <- k+1L}
        a[k] <- x+y
        count <- count+1L
        MyParts[[count]] <- a[1L:k]
    }
    MyParts
}

I attempted to extend this method to numbers with more one than one prime factor, but my code got very clunky. After wrestling with this idea for a while, I decided to try a different route. My new algorithm makes no use of generating partitions whatsoever. It is more of a "lookback" algorithm that takes advantage of factorizations that have already been generated. The code is below:

FactorRepresentations <- function(n) {

MyFacts <- EfficientFactorList(n)
MyReps <- lapply(1:n, function(x) x)

    for (k in 4:n) {
        if (isprime(k)) {next}
        myset <- MyFacts[[k]]
        mylist <- vector("list")
        mylist[[1]] <- k
        count <- 1L
            for (j in 2:ceiling(length(myset)/2)) {
                count <- count+1L
                temp <- as.integer(k/myset[j])
                myvec <- sort(c(myset[j], temp), decreasing=TRUE)
                mylist[[count]] <- myvec
                MyTempRep <- MyReps[[temp]]

                if (isprime(temp) || temp==k) {next}

                if (length(MyTempRep)>1) {
                    for (i in 1:length(MyTempRep)) {
                        count <- count+1L
                        myvec <- sort(c(myset[j], MyTempRep[[i]]), decreasing=TRUE)
                        mylist[[count]] <- myvec
                    }
                }
            }
        MyReps[[k]] <- unique(mylist)
    }
    MyReps
}

The first function in the code above is simply a function that generates all factors. Here is the code if you are curious:

EfficientFactorList <- function(n) {
    MyFactsList <- lapply(1:n, function(x) 1)
    for (j in 2:n) {
        for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)}
    }
    MyFactsList
}

My algorithm is just okay if you are only concerned with numbers less than 10,000 (it generates all factorizations for every number <= 10,000 in about 17 seconds), but it definitely doesn't scale well. I would like to find an algorithm that has the same premise of generating a list of all factorizations for every number less than or equal to n as some of the applications I have in mind will reference a given factorization multiple times, thus having it in a list should be faster than generating it on the fly everytime (I know there is a memory cost here).

Upvotes: 8

Views: 507

Answers (2)

Joseph Wood
Joseph Wood

Reputation: 7608

In case anybody is interested in generating the multiplicative partitions for one number n, below are two algorithms that will do just that (the function IntegerPartition comes from the question above):

library(gmp)
library(partitions)
get_Factorizations1 <- function(MyN) {
    pfs <- function (x1) {
        n1 <- length(x1)
        y1 <- x1[-1L] != x1[-n1]
        i <- c(which(y1), n1)
        list(lengths = diff(c(0L, i)), values = x1[i], uni = sum(y1)+1L)
    }

    if (MyN==1L) return(MyN)
    else {
        pfacs <- pfs(as.integer(factorize(MyN)))
        unip <- pfacs$values
        pv <- pfacs$lengths
        n <- pfacs$uni
        mySort <- order(pv, decreasing = TRUE)
        pv <- pv[mySort]
        unip <- unip[mySort]
        myReps <- lapply(IntegerPartitions(pv[1L]), function(y) unip[1L]^y)
        if (n > 1L) {
            mySet <- unlist(lapply(2L:n, function(x) rep(unip[x],pv[x])))
            for (p in mySet) {
                myReps <- unique(do.call(c,
                    lapply(myReps, function(j) { 
                        dupJ <- duplicated(j)
                        nDupJ <- !dupJ
                        SetJ <- j[which(nDupJ)]
                        lenJ <- sum(nDupJ)
                        if (any(dupJ)) {v1 <- j[which(dupJ)]} else {v1 <- vector(mode="integer")}
                        tList <- vector("list", length=lenJ+1L)
                        tList[[1L]] <- sort(c(j,p))

                        if (lenJ > 1L) {c2 <- 1L
                            for (a in 1:lenJ) {tList[[c2 <- c2+1L]] <- sort(c(v1,SetJ[-a],SetJ[a]*p))}
                        } else {
                            tList[[2L]] <- sort(c(v1,p*SetJ))
                        }
                        tList
                    }
                )))
            }
        }
    }
    myReps
}

Below is josliber's code from above manipulated to handle a single case. The function MyFactors comes from this post (it returns all of the factors of a given number).

library(gmp)
get_Factorizations2 <- function(n) {
    myFacts <- as.integer(MyFactors(n))
    facts <- lapply(myFacts, function(x) 1L)
    numFacs <- length(myFacts)
    facts[[numFacs]] <- myFacts
    names(facts) <- facts[[numFacs]]
    for (j in 2L:numFacs) {
        x <- myFacts[j]
        if (isprime(x)>0L) {
            facts[[j]] <- list(x)
        } else {
            facts[[j]] <- myFacts[which(x%%myFacts[myFacts <= x]==0L)]
            x.facts <- facts[[j]][facts[[j]] != 1 & facts[[j]] <= (x^0.5+0.001)]
            allSmaller <- lapply(x.facts, function(pf) lapply(facts[[which(names(facts)==(x/pf))]], function(y) {
                if (all(y >= pf)) {
                    return(c(pf, y))
                } else {
                    return(NULL)
                }
            }))
            allSmaller <- do.call(c, allSmaller)
            facts[[j]] <- c(x, allSmaller[!sapply(allSmaller, function(y) is.null(y))])
        }
    }
    facts[[numFacs]]
}

Here are some benchmarks:

set.seed(101)
samp <- sample(10^7, 10^4)
library(rbenchmark)
benchmark(getFacs1=sapply(samp, get_Factorizations),
            getFacs2=sapply(samp, get_Factorizations2),
            replications=5,
            columns = c("test", "replications", "elapsed", "relative"),
            order = "relative")
test replications elapsed relative
1 getFacs1            5  117.68    1.000
2 getFacs2            5  216.39    1.839


system.time(t2 <- get_Factorizations(25401600))
 user  system elapsed 
10.89    0.03   10.97
system.time(t2 <- get_Factorizations2(25401600))
 user  system elapsed 
21.08    0.00   21.12 

length(t1)==length(t2)
[1] TRUE

object.size(t1)
28552768 bytes
object.size(t2)
20908768 bytes

Even though get_Factorizations1 is faster, the second method is more intuitive (see josliber's excellent explanation above) and it produces a smaller object. For the interested reader, here is a really good paper about the subject.

Upvotes: 0

josliber
josliber

Reputation: 44340

Your function EfficientFactorList does a good job of efficiently grabbing the set of all factors for each number from 1 to n, so all that remains is getting the set of all factorizations. As you suggest, using the factorizations of smaller values to compute the factorizations for larger values seems like it could be efficient.

Consider a number k, with factors k_1, k_2, ..., k_n. A naive approach would be to combine the factorizations of k/k_1, k/k_2, ..., k/k_n, appending k_i to each factorization of k/k_i to yield a factorization of k. As a worked example, consider computing the factorizations of 16 (which has non-trivial factors 2, 4, and 8). 2 has factorization {2}, 4 has factorizations {4, 2*2}, and 8 has factorizations {8, 4*2, 2*2*2}, so we would compute the full set of factorizations by first computing {2*8, 4*4, 2*2*4, 8*2, 4*2*2, 2*2*2*2} and then taking the unique factorizations, {8*2, 4*4, 4*2*2, 2*2*2*2}. Adding 16 yields the final answer.

A more efficient approach is to notice that we don't need to append k_i to all the factorizations of k/k_i. For instance, we didn't need to add 2*2*4 from the factorization of 4 because this is already included from the factorization of 8. Similarly, we didn't need to add 2*8 from the factorization of 2 because this is already included from the factorization of 8. In general, we only need to include a factorization from k/k_i if all the values in the factorization are k_i or greater.

In code:

library(gmp)
all.fact <- function(n) {
  facts <- EfficientFactorList(n)
  facts[[1]] <- list(1)
  for (x in 2:n) {
    if (length(facts[[x]]) == 2) {
      facts[[x]] <- list(x)  # Prime number
    } else {
      x.facts <- facts[[x]][facts[[x]] != 1 & facts[[x]] <= (x^0.5+0.001)]
      allSmaller <- lapply(x.facts, function(pf) lapply(facts[[x/pf]], function(y) {
        if (all(y >= pf)) {
            return(c(pf, y))
        } else {
            return(NULL)
        }
      }))
      allSmaller <- do.call(c, allSmaller)
      facts[[x]] <- c(x, allSmaller[!sapply(allSmaller, function(y) is.null(y))])
    }
  }
  return(facts)
}

This is a good deal quicker than the posted code:

system.time(f1 <- FactorRepresentations(10000))
#    user  system elapsed 
#  13.470   0.159  13.765 
system.time(f2 <- all.fact(10000))
#    user  system elapsed 
#   1.602   0.028   1.641 

As a sanity check, it also returns the same number of factorizations for each number:

lf1 <- sapply(f1, length)
lf2 <- sapply(f2, length)
all.equal(lf1, lf2)
# [1] TRUE

Upvotes: 5

Related Questions