Marina Cuesta
Marina Cuesta

Reputation: 11

Problem with parallel computing over an index using foreach and %dopar% in R

I am trying to efficiently code the computation of the KDN complexity measure, which involves a loop over all of the rows of a distance matrix and making some computations out of it.

I am trying to parallel this code with foreach and %dopar% functions, but I am not achieving any running time reduction. I am conscious that some parallel computations are not efficient because of memory management, but I don’t know if this is my case or if I am doing something wrong.

This is a reproducible example with digits data from rsvd package:

First, I call all the necessary packages, I read the digits data and then I get some useful information.

#######################
### NEEDED PACKAGES ###
#######################

library(dplyr)
library(parallelDist)
# for Parallel Processing
library(doParallel)  
library(foreach)
# for digits data
library(rsvd)


#############
###  DATA ###
#############

data(digits)
data = as.data.frame(digits)

# Dividing on X variables and Y target
dataX = data %>%
  dplyr::select(-label)
dataY = data %>%
  mutate(label=factor(label)) %>% 
  pull(label)

## number of data
n=dim(dataX)[1]  

Then, I do some necessary computations prior to the KDN loop I want to efficiently parallel.

##############################
###  PREVIOUS COMPUTATIONS ###
##############################

## number of available data in each class
n_data_classes=table(factor(dataY))

## number of data to be considered as neighbours in each class
k=0.05
k_neighbours_classes=ceiling(n_data_classes*k)

##  DISTANCE MATRIX COMPUTATION
 # this is time consuming but I'm not concerned about this
distance_matrix=as.matrix(parDist(scale(dataX))) 

The KDN computation without no parallelization is the next one and it takes 12 secs.

#########################################
### COMPUTING KDN: NO PARALLELIZATION ###
#########################################

## KDN instance level computation
# inicialization of a vector to store KDN instance level values
kdn_instance=numeric(n)

system.time(

  for (ix in 1:n){
    ## Gettig the class of ix data point
    class_ix=dataY[ix]
    ## number of data to be considered as neighbours in this class
    k_value=k_neighbours_classes[class_ix]
    
    # we get the k_value nearest neighbors set of ix
    distances_ix=distance_matrix[ix,]
    distances_ix_ordered=order(distances_ix,decreasing = F)
    knn_set_ix=distances_ix_ordered[2:(k_value+1)]
    
    # Y value of the k_neighbors_set_ix
    Y_value_knn_set_ix=dataY[knn_set_ix]
    # Y value of ix data
    Y_value_ix=dataY[ix]
    
    # number of data in knn_set_ix with different Y value that ix
    knn_set_ix_different_Y_value=length(Y_value_knn_set_ix[Y_value_knn_set_ix!=Y_value_ix])
    kdn_instance[ix]=knn_set_ix_different_Y_value/k_value
  }
)

# user  system elapsed 
# 12.29    0.37   12.67 secs

My attempt to parallel that loop is the following one, using foreach and %dopar%, which takes 35 secs.

######################################
### COMPUTING KDN: PARALLELIZATION ###
######################################

## Preparing for paralleling
# number of cores to use
n.cores <- parallel::detectCores() - 1

# we define the cluster and register it so it can be used by %dopar%
my.cluster <- parallel::makeCluster(n.cores,type = "PSOCK")

# register it to be used by %dopar%
doParallel::registerDoParallel(cl = my.cluster)

## KDN instance level computation
kdn_instance= NULL

#iterator
itx <- iter(distance_matrix, by = 'row')

system.time(

  kdn_instance <- foreach(
    ix = itx,
    .combine = 'c'
  ) %dopar% {
    ## Gettig the class of ix data point
    class_ix=dataY[ix]
    ## number of data to be considered as neighbours in this class
    k_value=k_neighbours_classes[class_ix]
    
    # we get the k_value nearest neighbors set of ix
    distances_ix=distance_matrix[ix,]
    distances_ix_ordered=order(distances_ix,decreasing = F)
    knn_set_ix=distances_ix_ordered[2:(k_value+1)]
    
    # Y value of the k_neighbors_set_ix
    Y_value_knn_set_ix=dataY[knn_set_ix]
    # Y value of ix data
    Y_value_ix=dataY[ix]
    
    # number of data in knn_set_ix with different Y value that ix
    knn_set_ix_different_Y_value=length(Y_value_knn_set_ix[Y_value_knn_set_ix!=Y_value_ix])
    knn_set_ix_different_Y_value/k_value}
)
parallel::stopCluster(cl = my.cluster)
  

# user  system elapsed 
# 12.38    4.64   35.14 secs

As can be seen, the parallel computation is taking more time than the not parallelized one.

My question is: is there something wrong with the parallel processing code? Is there a better way to do it? Maybe it should be done with another package.

Upvotes: 1

Views: 180

Answers (1)

M.Viking
M.Viking

Reputation: 5398

Attempted to run your code, and my computer froze and ran out of memory.

Three ideas for you, first do consider making a new question as the community bot suggested. Include information about your system (parallel processing works different on different systems)

Second, as you are seeking performance be sure to optimize your system, for example, using openblas BLAS/LAPACK https://csantill.github.io/RPerformanceWBLAS/

Finally, test basic %dopar% functionality as outlined in the documentation, Chapter 5. A more serious example https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf

library(doParallel)
registerDoParallel(cores=2) ## test with different number of cores
# cl <- makeCluster(2) ## Test also different number of clusters
# registerDoParallel(cl) 

 x <- iris[which(iris[,5] != "setosa"), c(1,5)]
 trials <- 10000

 ptime <- system.time({
   r <- foreach(icount(trials), .combine=cbind) %dopar% {
     ind <- sample(100, 100, replace=TRUE)
     result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
     coefficients(result1)
     }
   })[3]
 ptime
 # stopCluster(cl)


 stime <- system.time({
   r <- foreach(icount(trials), .combine=cbind) %do% {
     ind <- sample(100, 100, replace=TRUE)
     result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
     coefficients(result1)
     }
   })[3]
 stime

Results on my old laptop:

Description Elapsed seconds
stime, baseline %do% method 21.479
registerDoParallel(cores=1) 20.425
registerDoParallel(cores=2) 11.508
registerDoParallel(cores=3) 11.107
registerDoParallel(cores=4) 9.491
cl<-makeCluster(1); registerDoParallel(cl) 26.265
cl<-makeCluster(2); registerDoParallel(cl) 14.66
cl<-makeCluster(3); registerDoParallel(cl) 12.827
cl<-makeCluster(4); registerDoParallel(cl) 11.943

Upvotes: 1

Related Questions