Reputation: 11
I am trying to efficiently code the computation of the KDN complexity measure, which involves a loop over all of the rows of a distance matrix and making some computations out of it.
I am trying to parallel this code with foreach
and %dopar%
functions, but I am not achieving any running time reduction. I am conscious that some parallel computations are not efficient because of memory management, but I don’t know if this is my case or if I am doing something wrong.
This is a reproducible example with digits data from rsvd
package:
First, I call all the necessary packages, I read the digits data and then I get some useful information.
#######################
### NEEDED PACKAGES ###
#######################
library(dplyr)
library(parallelDist)
# for Parallel Processing
library(doParallel)
library(foreach)
# for digits data
library(rsvd)
#############
### DATA ###
#############
data(digits)
data = as.data.frame(digits)
# Dividing on X variables and Y target
dataX = data %>%
dplyr::select(-label)
dataY = data %>%
mutate(label=factor(label)) %>%
pull(label)
## number of data
n=dim(dataX)[1]
Then, I do some necessary computations prior to the KDN loop I want to efficiently parallel.
##############################
### PREVIOUS COMPUTATIONS ###
##############################
## number of available data in each class
n_data_classes=table(factor(dataY))
## number of data to be considered as neighbours in each class
k=0.05
k_neighbours_classes=ceiling(n_data_classes*k)
## DISTANCE MATRIX COMPUTATION
# this is time consuming but I'm not concerned about this
distance_matrix=as.matrix(parDist(scale(dataX)))
The KDN computation without no parallelization is the next one and it takes 12 secs.
#########################################
### COMPUTING KDN: NO PARALLELIZATION ###
#########################################
## KDN instance level computation
# inicialization of a vector to store KDN instance level values
kdn_instance=numeric(n)
system.time(
for (ix in 1:n){
## Gettig the class of ix data point
class_ix=dataY[ix]
## number of data to be considered as neighbours in this class
k_value=k_neighbours_classes[class_ix]
# we get the k_value nearest neighbors set of ix
distances_ix=distance_matrix[ix,]
distances_ix_ordered=order(distances_ix,decreasing = F)
knn_set_ix=distances_ix_ordered[2:(k_value+1)]
# Y value of the k_neighbors_set_ix
Y_value_knn_set_ix=dataY[knn_set_ix]
# Y value of ix data
Y_value_ix=dataY[ix]
# number of data in knn_set_ix with different Y value that ix
knn_set_ix_different_Y_value=length(Y_value_knn_set_ix[Y_value_knn_set_ix!=Y_value_ix])
kdn_instance[ix]=knn_set_ix_different_Y_value/k_value
}
)
# user system elapsed
# 12.29 0.37 12.67 secs
My attempt to parallel that loop is the following one, using foreach and %dopar%, which takes 35 secs.
######################################
### COMPUTING KDN: PARALLELIZATION ###
######################################
## Preparing for paralleling
# number of cores to use
n.cores <- parallel::detectCores() - 1
# we define the cluster and register it so it can be used by %dopar%
my.cluster <- parallel::makeCluster(n.cores,type = "PSOCK")
# register it to be used by %dopar%
doParallel::registerDoParallel(cl = my.cluster)
## KDN instance level computation
kdn_instance= NULL
#iterator
itx <- iter(distance_matrix, by = 'row')
system.time(
kdn_instance <- foreach(
ix = itx,
.combine = 'c'
) %dopar% {
## Gettig the class of ix data point
class_ix=dataY[ix]
## number of data to be considered as neighbours in this class
k_value=k_neighbours_classes[class_ix]
# we get the k_value nearest neighbors set of ix
distances_ix=distance_matrix[ix,]
distances_ix_ordered=order(distances_ix,decreasing = F)
knn_set_ix=distances_ix_ordered[2:(k_value+1)]
# Y value of the k_neighbors_set_ix
Y_value_knn_set_ix=dataY[knn_set_ix]
# Y value of ix data
Y_value_ix=dataY[ix]
# number of data in knn_set_ix with different Y value that ix
knn_set_ix_different_Y_value=length(Y_value_knn_set_ix[Y_value_knn_set_ix!=Y_value_ix])
knn_set_ix_different_Y_value/k_value}
)
parallel::stopCluster(cl = my.cluster)
# user system elapsed
# 12.38 4.64 35.14 secs
As can be seen, the parallel computation is taking more time than the not parallelized one.
My question is: is there something wrong with the parallel processing code? Is there a better way to do it? Maybe it should be done with another package.
Upvotes: 1
Views: 180
Reputation: 5398
Attempted to run your code, and my computer froze and ran out of memory.
Three ideas for you, first do consider making a new question as the community bot suggested. Include information about your system (parallel processing works different on different systems)
Second, as you are seeking performance be sure to optimize your system, for example, using openblas BLAS/LAPACK https://csantill.github.io/RPerformanceWBLAS/
Finally, test basic %dopar%
functionality as outlined in the documentation, Chapter 5. A more serious example https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf
library(doParallel)
registerDoParallel(cores=2) ## test with different number of cores
# cl <- makeCluster(2) ## Test also different number of clusters
# registerDoParallel(cl)
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
r <- foreach(icount(trials), .combine=cbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
})[3]
ptime
# stopCluster(cl)
stime <- system.time({
r <- foreach(icount(trials), .combine=cbind) %do% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
})[3]
stime
Results on my old laptop:
Description | Elapsed seconds |
---|---|
stime, baseline %do% method |
21.479 |
registerDoParallel(cores=1) |
20.425 |
registerDoParallel(cores=2) |
11.508 |
registerDoParallel(cores=3) |
11.107 |
registerDoParallel(cores=4) |
9.491 |
cl<-makeCluster(1); registerDoParallel(cl) |
26.265 |
cl<-makeCluster(2); registerDoParallel(cl) |
14.66 |
cl<-makeCluster(3); registerDoParallel(cl) |
12.827 |
cl<-makeCluster(4); registerDoParallel(cl) |
11.943 |
Upvotes: 1