Fred
Fred

Reputation: 3

R snowfall : parallel apply on table columns

I have a table M with many columns and rows, obtained from a text file :

M <- read.table("text.csv",header=TRUE,sep="\t")

To obtain the ranks by columns I successfully used :

M <- apply(M,2,rank)

I would like to speed up the computation but I did not succeed to implement this function in snowfall.

I tried :

library(snowfall)
sfStop()
nb.cpus <- 8
sfInit(parallel=TRUE, cpus=nb.cpus, type = "SOCK")
M <- sfClusterApplyLB(M, rank) # does not work
M <- sfClusterApply(M,2,rank) # does not work
M <- sfClusterApplyLB(1:8, rank,M) # does not work

What is the equivalent of M <- apply(M,2,rank) in snowfall ?

Thanks in advance for your help !

Upvotes: 0

Views: 341

Answers (3)

Fred
Fred

Reputation: 3

Thank you very much for your help !

I finally combined the solution of Lucas and Steve to obtain the ideal solution for my problem.

I think that my code was not working with M <- sfClusterApply(M,2,rank) because sfExportAll() was missing.

So finally the simplest solution working for me is :

M <- read.table("text.csv",header=TRUE,sep="\t")
n_cols=ncol(M)
nb.cpus <- 4
library(snowfall)
sfStop()
sfInit(parallel=TRUE, cpus=nb.cpus, type = "SOCK") 
sfExportAll()
M <- sfApply(M,2,rank)
sfRemoveAll()
sfStop()

Upvotes: 0

Steve Weston
Steve Weston

Reputation: 19677

The equivalent of apply in snowfall is sfApply. Here's an example:

library(snowfall)
sfInit(parallel=TRUE, cpus=4, type="SOCK")
M <- data.frame(matrix(rnorm(40000000), 2000000, 20))
r <- sfApply(M, 2, rank)
sfStop()

This example runs almost twice as fast as the sequential version on my Linux machine using four cores. That's not too bad considering that rank isn't very computationally intensive.

Upvotes: 1

Lucas Fortini
Lucas Fortini

Reputation: 2460

Here is a working example:

rank_M_df_col_fx=function(i){
  #M<- read.table("text.csv",header=TRUE,sep="\t")
  col_rank=rank(M[,i])
  return(col_rank)
}

M=data.frame(replicate(10,sample(0:100,1000,rep=TRUE)))
n_cols=ncol(M)

library(snowfall)
sfInit(parallel=TRUE) # 
sfExportAll()
rank_results_list=sfLapply(x=c(1:n_cols), fun=rank_M_df_col_fx)
rank_dataframe <- data.frame(matrix(unlist(rank_results_list), nrow=nrow(M), byrow=F))

sfRemoveAll()
sfStop()

However, having shown how to do it, this is a type of fast operation that parallelizing will likely not give substantially faster results, given the overhead of starting the instances, etc.

Upvotes: 0

Related Questions