Reputation: 155

parallel computing for glmulti in R in Windows

as per glmulti package document, chunks are arguments for using multi CPUs.

when using exhaustive screening.

But, even when I put 4 in both chunk and chunks and method='h' with family='binomial', R only use a single core.

the function I used

glmulti(y~. ,level=1,data=ctrain,fitfunction = 'glm',chunk = 4, chunks = 4,method = 'h',family='binomial')

demo data set that is similar to mine:- https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip

PS: any other the package that solve problem is also acceptable.

Upvotes: 2

Answers (2)

swihart

Reputation: 2738

## try chunk of chunks, per vignette:
chunk1of2 <- glmulti(mod, 
                     level=2, 
                     method="h", 
                     marginality=TRUE, 
                     name="exhausting_glm",
                     chunks=2,
                     chunk=1)
write(chunk1of2, file="|object")

chunk2of2 <- glmulti(mod, 
                     level=2, 
                     method="h", 
                     marginality=TRUE, 
                     name="exhausting_glm",
                     chunks=2,
                     chunk=2)
write(chunk2of2, file="|object")

fullobj <- consensus(as.list(list.files(pattern = "exhausting_glm")),
                     confsetsize = NA)

summary(fullobj)$bestmodel

These will save files "exhausting_glm1.1" and "exhausting_glm1.2" in your current working directory and consensus will grab them. Please note the as.list() in consensus -- this wasn't included in the vignette but I required it to prevent an error.

Let's say you ran glmulti for your circumstances with method='d' to get the diagnostics and the call reported you had 327,680 models. If you did the following, you could fit two formulae in each chunk. There are a variety of ways to distribute those chunks depending on your computing system/resources:

## try chunk of chunks, per vignette:
chunk1of2 <- glmulti(mod, 
                     level=2, 
                     method="h", 
                     marginality=TRUE, 
                     name="exhausting_glm",
                     chunks=327680/2,
                     chunk=1)
write(chunk1of2, file="|object")

chunk2of2 <- glmulti(mod, 
                     level=2, 
                     method="h", 
                     marginality=TRUE, 
                     name="exhausting_glm",
                     chunks=327680/2,
                     chunk=2)
write(chunk2of2, file="|object")

fullobj <- consensus(as.list(list.files(pattern = "exhausting_glm")),
                     confsetsize = NA)

## best of the 4 models fit out of the 327680 possible
summary(fullobj)$bestmodel

One thing to note with regards to scaling and breaking up big candidate sets into many smaller chunks -- I found that if method='d' gave a large number of models, then breaking it up into chunks was still computationally intensive because each chunk had to "reinvent the wheel" and calculate all the candidate models from scratch again -- and this takes time.

Upvotes: 0

F. Privé

Reputation: 11738

If you read the vignette (that you can download there), you see that chunk is determining only one part of the computation.

I think you just need to make calls from a loop with chunk in seq_len(chunks) and combine the results.

You should email the author or open an issue for further information.

Upvotes: 1

parallel computing for glmulti in R in Windows

Answers (2)

Related Questions