Why increasing the number of "cores" makes a difference?

Question

I am new to the concept of parallel computing

(which I am trying to apply on a script in which a loop builds several regression models for about 1000 times and makes predictions each time based on these models' coefficients. The data sets in each case are too big and the models involve dummy codes and weights which slow down the process even further. Hence, I am trying to apply foreach instead of the 'for' loop.)

I am trying to use the doParallel and foreach libraries and set the number of cores with registerDoParallel(). I have a Windows 10 machine. My understanding is that calls like detectCores() and Sys.getenv('NUMBER_OF_PROCESSORS') will return the number of "logical processors" rather than cores:

> detectCores()
  [1] 4

My Task Manager shows these specifications

task manager

I tried to experiment a bit with what is the "right"(?) number of cores I should set with registerDoParallel() and realised that it will accept any number. I experimented a bit further and found out that this would even make a difference. I have adapted the script above from the creators of these two libraries (pg. 3) to compare serial to parallel execution with various number of cores.

x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000

library(foreach)
library(doParallel)

#detectCores()
#Sys.getenv('NUMBER_OF_PROCESSORS') 
registerDoParallel(cores = 4)
getDoParWorkers()

ptimes = numeric(15)
stimes = numeric(15)

for (i in 1:15) {
stime <- system.time({
  r <- foreach(icount(trials), .combine=cbind) %do% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
  }
})[3]
stimes[i] = stime
}

for (i in 1:15) {
ptime <- system.time({
  r <- foreach(icount(trials), .combine=cbind) %dopar% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
  }
})[3]
ptimes[i] = ptime
}

Here's the results, measured as the mean time in seconds for one iteration. It seems to have a sweet spot at 12 "cores".

process mean sd
sequential: 53.8 5.4
"2-core": 32.3 1.9
"4-core": 28.7 2.6
"12-core": 22.9 0.5
"24-core": 27.5 1.9

I even compared mean performance between, say, "2-core" and "12-core" with t-tests and they are not due to chance.

My questions are:

Is it good practice, based on the above, to be running my scripts in "12-core mode" when using code that can be parallelised?

I want to use a higher-performance computer to run my script; do I need to repeat this process to find optimal(=fastest) performance?

Patric · Accepted Answer

In practice, it will be nice to set the same number of hardware (physical, 2 in your example) cores as computing threads.

More details:

If your workload is compute intensive, more threads (large than hardware cores) will compete the resource and degrade the performance. However, in some case, such as your example, the workload requires much memory access per computations so that there will be the benefit for more threads to hide memory latency. Actually, the CPU is latency orientation and it can hide latency automatically. In your case, more than 2 threads can gain further improvements but not too much.

Therefore, compared with the tuning time (how much threads you should be used?) on the different system in each time of run, it will be better to use # of hardware cores in your parallel computing program.

A good introduction to parallel computing with R in here.

Why increasing the number of "cores" makes a difference?

Answers (1)

Related Questions

Why increasing the number of &quot;cores&quot; makes a difference?

Answers (1)

Related Questions

Why increasing the number of "cores" makes a difference?