Parrelize a nested for loop in R

Question

I have a dataframe with DNA barcodes in rownames, for which I would like to determine the difference (e.g. Levenshtein distance) between these barcodes. The values in the dataframe need to be processed later in the analysis. I've worked out an example which uses a slightly simplified analysis just comparing the individual bases (A,T,G,C) after a strsplit and puts the results in a matrix:

results <- matrix(data=NA,nrow=dim(vals)[1],ncol=dim(vals)[1])

# Do the string splitting and comparison of the barcodes one by one.
system.time(
    for (i in 1:dim(dat)[1]) {
        for (j in 1:dim(dat)[1]) {
        results[i,j] <- sum(unlist(strsplit(rownames(dat)[i], split="")) !=  unlist(strsplit(rownames(dat)[j], split="")))
        }
    }   
)

This all works as expected but off course is embarrasingly parallel. To save some time and to put our university cluster to good use, I would like to try and parallelize this function, but I'm having trouble getting it right. Hints would be appreciated!

Parrelize a nested for loop in R

Answers (1)

Related Questions