Kumar
Kumar

Reputation: 175

Parallel processing using the list and dataframe in R

I am trying to parallelize the following process in R:

df <- data.frame(col1 = c("A","B","C"), col2 = c("D","E","F"))
mylist <- list(c(1:4),c(1:7),c(1:5))
df$col3 <- NA
df$col4 <- NA
for(i in 1:nrow(df))
{
   df$col3[i] = list(mylist[[i]])
   df$col4[i] = length(unlist(df$col3[i]))
}

I tried the following approach by modifying the above code and using the following approach:

library(future.apply)
func <-function(n)
{
   for(i in n)
   {
      df$col3[i] = list(mylist[[i]])
      df$col4[i] = length(unlist(df$col3[i]))
   }
}
future_lapply(1:3,func)

The above approach didn't work for me. I tried to search the stackexchange but couldn't find a relevant answer. Please help. Thanks in advance.

Note:

  1. The above mylist and df are toy examples, the df can contain 10^7 rows.
  2. I am using windows environment and R version 4.2.0

Upvotes: 0

Views: 76

Answers (1)

OldManSeph
OldManSeph

Reputation: 2721

Don't know if there is a quicker way, but this is what I got where everything is put into the same df

library(future.apply)

func <-function(n) {
    df[n,3] <- list(mylist[n])
    df[n,4] <- length(unlist(mylist[n]))
    return(df[n,])
}
df <- future_lapply(1:3,func)
df <- do.call(rbind,df)

Upvotes: -1

Related Questions