Reputation: 3
I have a dataframe df
and need to apply a function that gives a score to each columns (calc.fitness
):
df
# ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8
# g1 5 2 7 10 7 10 10 6
# g2 1 4 5 4 1 2 5 4
# g3 16 14 7 4 2 2 8 7
# g4 7 5 5 3 2 5 1 6
# g5 7 2 1 3 7 2 4 1
# g6 4 7 11 4 9 3 9 14
# g7 12 8 6 7 5 9 7 4
# g8 4 2 3 2 2 4 1 1
# g9 1 2 1 1 2 1 2 1
using sapply, I will get the following results which is the correct one but very time consuming as size of df
increases:
sapply(as.list(df), calc.fitness,filterTable=my.df)
# ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8
# 8.481359e-02 6.419552e-01 5.847587e-02 6.713477e-02 1.552056e-01 1.305787e+34 2.805074e-01 2.039931e+00
I used [Tag:mclapply` to make it faster as follows:
numCores <- detectCores()
result <- unlist(mclapply(1:8, function(x) {
return(calc.fitness(df[,x], filterTable=my.df))}, mc.preschedule = TRUE, mc.cores = numCores))
# result
# [1] 8.481359e-02 8.481359e-02 8.481359e-02 8.481359e-02 1.305787e+34 1.305787e+34 1.305787e+34 1.305787e+34
But as results show, mclapply
does not work correctly and I do not know what is the problem and how to fix. I really appreciate any help!
PS: calc.fitness
is a long method, I tried to make it shorter here:
calc.fitness <- function(df.val, filterTable = my.df) {
input.path <- "/home/Nikki/Desktop/v2017.0/exec/Input_2017.txt"
filterTable$xe <- df.val[1]
filterTable$xth <- df.val[2]
filterTable$xfi <- df.val[3]
filterTable$xfw <- df.val[4]
filterTable$xfm <- df.val[5]
filterTable$xls <- df.val[6]
filterTable$xhls <- df.val[7]
filterTable$xvt <- df.val[8]
filterTable$xvd <- df.val[9]
write.fwf(filterTable,append = TRUE,file = paste("Input_2017", ".txt", sep = ""),width = 25, rownames = F,colnames = F,quote = F)
command <- "wine /home/Nikki/Desktop/v2017.0/exec/2017File.exe"
system(command)
output.file <-read.table("/home/Nikki/Desktop/v2017.0/exec/Output_2017.txt",header = TRUE,fill = TRUE)
output.pgt <- as.numeric(levels(output.file$pgt))[output.file$pgt]
calc.sol <- output.pgt[!is.na(output.pgt)]
opt.sol <- filterTable$PressureDropGL
n <- length(calc.sol)
subtract.val <- calc.sol - opt.sol
denominator <- opt.sol
sq.output <- (subtract.val / denominator) ^ 2
fitness.val <- sum(sq.output) / n
return(fitness.val)
}# end of function
my.df:
Appreciate your help.
Upvotes: 0
Views: 898
Reputation: 36
If it works with sapply but not with mclapply, it is surely because sapply and lapply slightly differ, and what you would like to use is something like mcsapply instead of mclapply.
If it is the case, you will find an implementation of mcsapply in following duplicate answer, that make use extensive use of in my code :
multicore::sapply?
I guess this question is a duplicate of this one by the way
Upvotes: 0
Reputation: 8506
sapply
simplifies to a matrix, while unlisting your list of columns returns the vectors of each column, one after the other. Consider using the cumsum
function as illustration:
df <-
structure(
list(
ch1 = c(5L, 1L, 16L, 7L, 7L, 4L, 12L, 4L, 1L),
ch2 = c(2L, 4L, 14L, 5L, 2L, 7L, 8L, 2L, 2L),
ch3 = c(7L, 5L, 7L, 5L, 1L, 11L, 6L, 3L, 1L),
ch4 = c(10L, 4L, 4L, 3L, 3L, 4L, 7L, 2L, 1L),
ch5 = c(7L, 1L, 2L, 2L, 7L, 9L, 5L, 2L, 2L),
ch6 = c(10L, 2L, 2L, 5L, 2L, 3L, 9L, 4L, 1L),
ch7 = c(10L, 5L, 8L, 1L, 4L, 9L, 7L, 1L, 2L),
ch8 = c(6L, 4L, 7L, 6L, 1L, 14L, 4L, 1L, 1L)
),
class = "data.frame",
row.names = c("g1", "g2", "g3", "g4", "g5", "g6", "g7", "g8", "g9")
)
sapply(as.list(df), cumsum)
#> ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8
#> [1,] 5 2 7 10 7 10 10 6
#> [2,] 6 6 12 14 8 12 15 10
#> [3,] 22 20 19 18 10 14 23 17
#> [4,] 29 25 24 21 12 19 24 23
#> [5,] 36 27 25 24 19 21 28 24
#> [6,] 40 34 36 28 28 24 37 38
#> [7,] 52 42 42 35 33 33 44 42
#> [8,] 56 44 45 37 35 37 45 43
#> [9,] 57 46 46 38 37 38 47 44
unlist(parallel::mclapply(1:8, function(x) {
return(cumsum(df[,x]))}, mc.preschedule = TRUE, mc.cores = 4L))
#> [1] 5 6 22 29 36 40 52 56 57 2 6 20 25 27 34 42 44 46 7 12 19 24 25 36 42
#> [26] 45 46 10 14 18 21 24 28 35 37 38 7 8 10 12 19 28 33 35 37 10 12 14 19 21
#> [51] 24 33 37 38 10 15 23 24 28 37 44 45 47 6 10 17 23 24 38 42 43 44
do.call(cbind, parallel::mclapply(1:8, function(x) {
return(cumsum(df[,x]))}, mc.preschedule = TRUE, mc.cores = 4L))
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 5 2 7 10 7 10 10 6
#> [2,] 6 6 12 14 8 12 15 10
#> [3,] 22 20 19 18 10 14 23 17
#> [4,] 29 25 24 21 12 19 24 23
#> [5,] 36 27 25 24 19 21 28 24
#> [6,] 40 34 36 28 28 24 37 38
#> [7,] 52 42 42 35 33 33 44 42
#> [8,] 56 44 45 37 35 37 45 43
#> [9,] 57 4618 46 38 37 38 47 44
Created on 2020-03-25 by the reprex package (v0.3.0)
Edit:
After seeing your function, you are appending data generated in it to one file. That may work fine if done sequentially, but when you do that in a parallel processes you are bound to run into trouble. Spawning multiple wine processes in parallel by itself may also not be the most efficient procedure to begin with, even if it yielded the correct results (profiling your (linear) code with the profvis
package would show you the bottleneck). Is there any alternative to the 2017File.exe
to calculate the fitness.val
?
If your plan was truly to sequentially append results from columns, then to properly initiate the parallel generation of results with your exe file, you may have to save unique instances of the sequentially growing file (your write.fwf command) and then pass those in parallel to the exe command, generating unique output.txt files for each sequential step, and then load the results from that in the correct order.
Upvotes: 1