Reputation: 2054
I am looking at parallel processing in R and was wondering if I could read in multiple txt files in parallel versus doing it sequentially. Reason for this is I have a shiny application and I want to cut down on the loading time and a large chunk is coming from loading the files.
Current situation:
Shipments_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_month.txt', fill = TRUE)
ShipmentsYear_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_year.txt', fill = TRUE)
Open_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_wip.txt', fill = TRUE)
WIP_Short_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_short.txt', fill = TRUE)
WIP_RTQT_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_sno_tasks_year.txt', fill = TRUE)
Invoiced_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_inv.txt', fill = TRUE)
I have seen examples of running in parallel but they all end with combining all of the files. Each file I import, I want as a separate dataframe.
Here are some examples:
How do you read in multiple .txt files into R?
https://www.r-bloggers.com/import-all-text-files-in-a-folder-with-parallel-execution/
Ideal situation (although I know this isn't the code):
RunParallel {
Shipments_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_month.txt', fill = TRUE)
ShipmentsYear_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_year.txt', fill = TRUE)
Open_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_wip.txt', fill = TRUE)
WIP_Short_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_short.txt', fill = TRUE)
WIP_RTQT_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_sno_tasks_year.txt', fill = TRUE)
Invoiced_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_inv.txt', fill = TRUE)
}
After comment from below
tic <- Sys.time()
Shipments_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_month.txt', fill = TRUE)
ShipmentsYear_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_year.txt', fill = TRUE)
Open_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_wip.txt', fill = TRUE)
WIP_Short_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_short.txt', fill = TRUE)
WIP_RTQT_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_sno_tasks_year.txt', fill = TRUE)
Invoiced_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_inv.txt', fill = TRUE)
toc <- Sys.time()
Sequential <- toc - tic
tic <- Sys.time()
file <- c("/srv/samba/share/SAP data//_zmrosales_ship_month.txt",
"/srv/samba/share/SAP data//_zmrosales_ship_year.txt",
"/srv/samba/share/SAP data//_zmrosales_inv.txt",
"/srv/samba/share/SAP data//_zmrosales_wip.txt",
"/srv/samba/share/SAP data//_zmro_short.txt",
"/srv/samba/share/SAP data//_zmro_sno_tasks_year.txt")
x2 <- lapply(file, data.table::fread)
Shipments_Raw <- as.data.frame(x2[1])
ShipmentsYear_Raw <- as.data.frame(x2[2])
Invoiced_Raw <- as.data.frame(x2[3])
Open_Raw <- as.data.frame(x2[4])
WIP_Short_Raw <- as.data.frame(x2[5])
WIP_RTQT_Raw <- as.data.frame(x2[6])
toc <- Sys.time()
Lapply <- toc - tic
Sequential
Lapply
Difference in time:
> Sequential
Time difference of 6.011156 secs
> Lapply
Time difference of 0.8015034 secs
Upvotes: 0
Views: 260
Reputation: 23919
Just use lapply
in combination with data.table
s super fast fread
:
lapply(files, data.table::fread)
Upvotes: 1