'all connections are in use' with parallel processing on AWS

Question

I have been able to run 20 models simultaneously using a r6a.48xlarge Amazon Web Services instance (192 vCPUs, 1536.00 GiB memory) and this R code:

setwd('/home/ubuntu/')

library(doParallel)

detectCores()

my.AWS.n.cores <- detectCores()
my.AWS.n.cores <- my.AWS.n.cores - 92
my.AWS.n.cores

registerDoParallel(my.cluster <- makeCluster(my.AWS.n.cores))


folderName <- 'model000222'


files <- list.files(folderName, full.names=TRUE)

start.time <- Sys.time()

foreach(file = files, .errorhandling = "remove") %dopar% {
  source(file)
}

stopCluster(my.cluster)

end.time <- Sys.time()
total.time.c <- end.time-start.time
total.time.c

However, the above R code did not run until I reduced the number of cores to 100 from 192 with this line:

my.AWS.n.cores <- my.AWS.n.cores - 92

If I tried running the code with all 192 vCPUs or 187 vCPUs I got this error message:

> my.AWS.n.cores <- detectCores()
> my.AWS.n.cores <- my.AWS.n.cores - 5
> my.AWS.n.cores
[1] 187
> 
> registerDoParallel(my.cluster <- makeCluster(my.AWS.n.cores))
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE,  : 
  all connections are in use
Calls: registerDoParallel ... makePSOCKcluster -> newPSOCKnode -> socketConnection

I had never seen that error message and could not locate it with an internet search. Could someone explain this error message? I do not know why my solution worked or whether a better solution exists. Can I easily determine the maximum number of connections I can use without getting this error? I suppose I could run the code incrementing the number of cores from 100 to 187.

I installed R on this instance with the lines below in PuTTY. R could not be located on the instance until I used the last line below: apt install r-base-core.

sudo su
echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/"
sudo apt-get update
sudo apt-get install r-base
sudo apt install dos2unix
apt install r-base-core

I used this AMI:

Ubuntu Server 18.04 LTS (HVM), SSD Volume Type

EDIT

Apparently, R has a hardwired limit of 128 connections. Apparently, you can increase the number of PSOCK workers manually if you are willing to rebuild R from source but I have not found an answer showing how to do that. Ideally I can find an answer showing how to do that with Ubuntu and AWS. See also these previous related questions.

Errors in makeCluster(multicore): cannot open the connection

Is there a limit on the number of slaves that R snow can create?

'all connections are in use' with parallel processing on AWS

Answers (1)

Explanation

Workaround

Related Questions

&#39;all connections are in use&#39; with parallel processing on AWS

Answers (1)

Explanation

Workaround

Related Questions

'all connections are in use' with parallel processing on AWS