Killian
Killian

Reputation: 89

Changing for loops to apply in r

I have a nice random graphing simulation in a function that requires n nodes and a preferential attachment parameter beta. I use for loops, however when you take n to be very large, the code takes a long while to run. I was wondering if it was possible to use the apply family to make this more efficient.

binfunction <- function(y) { #Set up bins to create preferential attachment
 L <- length(y)
 x <- c(0, cumsum(y))
 U <- runif(1, min = 0 , max = sum(y))
  for(i in 1:L) {
   if(x[i] <= U && x[i+1] > U){
    return(i)
  }
 } 
}

random_graph <- function(n, beta) { #Random graphing function
 mat <- matrix(0,n,n)
 mat[1,2] <- 1
 mat[2,1] <- 1
  for(i in 3:n) {
   degvect <- colSums(mat[ , (1:(i-1))])
   degvect <- degvect^(beta)
   j <- binfunction(degvect)
   mat[i,j] <- 1
   mat[j,i] <- 1
 }
return(mat)
}

And it can be used with:

set.seed(123)
random_graph(10, 0.5)

Upvotes: 0

Views: 63

Answers (2)

akuiper
akuiper

Reputation: 214957

Consider using sparse matrix when dealing with large graph problem. Using normal matrix can be both computational and memory expensive. Here is a brief test by replacing your matrix with sparseMatrix from library Matrix:

 library(Matrix)
 psidom_random_graph <- function(n, beta) { #Random graphing function
     mat <- sparseMatrix(dims = c(n, n), i = {}, j = {})
     mat[1,2] <- T
     mat[2,1] <- T
     for(i in 3:n) {
         degvect <- colSums(mat[,1:(i-1)])
         degvect <- degvect^beta
         j <- binfunction(degvect)
         mat[i,j] <- T
         mat[j,i] <- T
    }
  return(mat)
}

> microbenchmark(psidom_random_graph(1000,1), killian_random_graph(1000,1))
Unit: seconds
                          expr      min       lq     mean   median       uq
  psidom_random_graph(1000, 1) 1.575727 1.593401 1.637430 1.602013 1.666144
 killian_random_graph(1000, 1) 7.031790 7.092761 7.212975 7.192756 7.308052
      max neval
 1.971281   100
 7.552074   100

Upvotes: 0

Worice
Worice

Reputation: 4037

If you are looking for efficiency, vectorization is key.

The main time saver (at least for big n) is the use of sample.

For example:

n <- 1e7                             
sample(0:1, n, replace=TRUE) 

takes about 0.2 seconds, while

for(i in 1:n) sample(0:1, 1)

takes about 24 sececonds. Vectorised operations can often replace loops, but knowing when and where depends on the familiarity with the available function for your needs.

Upvotes: 0

Related Questions