Reputation: 51
I'm currently trying to run some code that implements parallel processing, but I'm running into this error:
Error: cannot allocate vector of size 2.1 Gb
Execution halted
Error in serialize(data, node$con) : error writing to connection
Calls: %dopar% ... postNode -> sendData -> sendData.SOCKnode -> serialize
Execution halted
Warning message:
system call failed: Cannot allocate memory
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode ->
unserialize Execution halted
I can't seem to figure out why there's a memory problem. If I take the code out of the foreach loop or change the foreach to a for loop, it works perfectly fine, so I don't think it has to do with the contents of the code itself, but rather something about the parallelization. Also, it seems to throw the error pretty soon after the code starts executing. Any ideas why this might be happening? Here's a look at my code:
list_storer <- list()
list_storer <- foreach(bt=2:bootreps, .combine=list, .multicombine=TRUE) %dopar% {
ur <- sample.int(nrow(dailydatyr),nrow(dailydatyr),replace=TRUE)
ddyr_boot <- dailydatyr[ur,]
weightvar <- ddyr_boot[,c('ymd1_IssueD','MatD_ymd2')]
weightvar <- abs(weightvar)
x <- DM[ur,]
y<-log(ddyr_boot$dirtyprice2/ddyr_boot$dirtyprice1)
weightings <- rep(1,nrow(ddyr_boot))
weightings <- weightings/(ddyr_boot$datenum2-ddyr_boot$datenum1)
treg <- repeatsales(y,x,maxdailyreturn,weightings,weightvar)
zbtcol <- 0
cnst <- NULL
if (is.null(dums) == FALSE){
zbtcol <- length(treg)-ncol(x)
cnst <- paste("tbs(",dums,")_",(middleyr),sep="")
if (is.null(interactVar) == FALSE){
ninteract <- (length(treg)-ncol(x)-length(dums))/length(dums)
interact <- unlist(lapply(cnst,function(xla) paste(xla,"*c",c(1:ninteract),sep="")))
cnst <- c(cnst,interact)}
}
}
tregtotal <- tregtotal + (is.na(treg)==FALSE)
treg[is.na(treg)==TRUE] <- 0
list_storer[[length(list_storer)+1]] <- treg
}
stopImplicitCluster(cl)
Upvotes: 3
Views: 9013
Reputation: 26823
Parallelisation as done by foreach
is a space vs. time trade-off. We get faster execution at the expense of higher memory usage. The reason for the higher memory usage is that several R process are started and each of them needs it’s own memory to hold the data necessary for the calculation. Currently foreach
is using an implicit PSOCK
cluster. One way to solve this is to make the cluster creation explicit using a lower number of processes. How low depends on the amount of memory you have and on the memory requirements of each job:
n <- parallel::detectCores()/2 # experiment!
cl <- parallel::makeCluster(n)
doParallel::registerDoParallel(cl)
<foreach>
parallel::stopCluster(cl)
Upvotes: 10