Reputation: 619
After previous discussion and help from the F.Privé I made some changes and the following code is actually doing what is expected to do.
library(purrr)
library(parallel)
p_list = list( "P1" = list( c("MAKM1","MMERMTD","FTRWDSE" )) ,
"P2" = list( c("MFFGGDSF1","DFRMDFMMGRSDFG","DSDMFFF")),
"P3" = list( c("MDERTDF1","DFRGRSDFMMG","DMMMFFFS")),
"P4" = list( c("MERTSDMDF1","SDFRGSSMRSDFG","DFFFM")))
chars <- set_names(c("M", "S", "M"), c("class.1", "class.35", "class.4"))
get_0_and_all_combn <- function(x) {
map(seq_along(x), function(i) combn(as.list(x), i, simplify = FALSE)) %>%
unlist(recursive = FALSE) %>%
c(0L, .)
}
get_pos_combn <- function(x, chars) {
x.spl <- strsplit(x, "")[[1]]
isUni1 = grep("class.1", names(chars))
isFirst = grepl("1",x)
map2(.x=chars, .y=seq_along(chars), .f=function( chr, index ) {
if( length(isUni1) != 0 ){
if( index == isUni1 & isFirst == TRUE )
1 %>% get_0_and_all_combn()
else{
which(x.spl == chr) %>%
get_0_and_all_combn()
}
}else{
which(x.spl == chr) %>%
get_0_and_all_combn()
}
}) %>%
expand.grid()
}
get_pos_combn_with_infos <- function(seq, chars, p_name) {
cbind.data.frame(p_name, seq, get_pos_combn(seq, chars))
}
combine_all <- function(p_list, chars){
i = 1
fp <- as.data.frame(matrix(ncol = 5))
colnames(fp) = c("p_name" ,"seq" , names(chars) )
for(p in p_list){
p_name = names(p_list)[i]
for(d in 1:length(p[[1]])){
seq = p[[1]][d]
f = get_pos_combn_with_infos(seq, chars, p_name)
# unlist the list wherever exist in the dataframe and collapse
# its values with the ":" symbol.
for(c in 1:nrow(f)){
if(is.list(f[c,3]))
f[c,3]=paste(unlist(f[c,3]),collapse=":")
if(is.list(f[c,4]))
f[c,4]=paste(unlist(f[c,4]),collapse=":")
if(is.list(f[c,5]))
f[c,5]=paste(unlist(f[c,5]),collapse=":")
}
fp = na.omit(rbind( f , fp ) )
}
i = i + 1
}
fp
}
numCores <- detectCores()
results = mcmapply(FUN=combine_all, MoreArgs=list(p_list , chars) , mc.cores = numCores-1)
The only thing, one should run is the last function ( combine_all()
), giving as inputs the p_list
and chars
variables .
If this is done, the result is a data.frame that contains all possible combinations of all possible combinations of the positions inside the strings (p_list
) of characters defined in the chars
variable
I know it's a little bit complicated but I don't know another way to explain the results.
Anyway. Because my actual list (p_list) is larger enough than the one in the example above I thought to make it run in parallel mode at more than one CPU cores at a time.
For that purpose as you can see I used the parallel
package. I run it in a linux box (because as I understood mcmapply
uses fork to create other processes), but the truth is that i didn't got any result, except an empty list.
Any idea maybe to improve the algorithm or to make it run in parallel is welcome.
Thank you.
Upvotes: 1
Views: 266
Reputation: 11728
Here, the problem is how you use mapply
. If you don't supply any arguments to vectorize over (the ...
), it is normal that it returns a list of length 0.
I will use foreach
because it's easier to work with. You can see this guide for parallelism in R with foreach.
Then combine_all
becomes
combine_all <- function(p_list, chars) {
p_names <- names(p_list)
all_all_f <- foreach(i = seq_along(p_list)) %dopar% {
p <- p_list[[i]][[1]]
p_name <- p_names[i]
all_f <- foreach(d = seq_along(p)) %do% {
f <- get_pos_combn_with_infos(p[d], chars, p_name)
# unlist the list wherever exist in the dataframe and collapse
# its values with the ":" symbol.
for(c in 1:nrow(f)){
if(is.list(f[c,3]))
f[c,3]=paste(unlist(f[c,3]),collapse=":")
if(is.list(f[c,4]))
f[c,4]=paste(unlist(f[c,4]),collapse=":")
if(is.list(f[c,5]))
f[c,5]=paste(unlist(f[c,5]),collapse=":")
}
f
}
do.call("rbind", all_f)
}
do.call("rbind", all_all_f)
}
Then you do
library(foreach)
doParallel::registerDoParallel(parallel::detectCores() - 1)
the_res_you_want <- combine_all(p_list = p_list, chars = chars)
doParallel::stopImplicitCluster()
On Linux and Mac, this is registering fork clusters (mc-like). On windows, this code is likely to not work.
PS1: beware that your data frame can be quite large if you parallelize over lots of elements.
PS2: you should really keep the data frames with column-lists rather than collapsing them into strings. See http://r4ds.had.co.nz/many-models.html#list-columns-1.
Upvotes: 2