Performing the same operations on multiple datasets in R

Question

I'm trying to make a function in R, that performs some specific operations on a lot of different data sets, with the following code:

library(parallel)
cluster = makeCluster(2)
setwd("D:\Speciale")


data_func <- function(kommune) {

  rm(list=ls())  
  
  library(dplyr)
  library(data.table)
  library (tidyr)
  
  #Load address and turbine datasets
  distances <- fread(file="Adresser og distancer\kommune.csv", header=TRUE, sep=",", colClasses = c("longitude" = "character", "latitude" = "character", "min_distance" = "character", "distance_turbine" = "character", "id_turbine" = "character"), encoding="Latin-1")
  turbines <- fread(file="turbines_DK.csv", header=TRUE, sep=",", colClasses = c("lon" = "character", "lat" = "character", "id_turbine" = "character", "total_height" = "character", "location" = "character"), encoding="Latin-1")

Some cleaning of the data and construction of new variables

#write out the dataset
  setwd("D:\Speciale\Analysedata")
  fwrite(mock_final, file = "final_kommune.csv", row.names = FALSE)
  

}

do.call(rbind, parLapply(cl = cluster, c("Albertslund", "Alleroed"), data_func))

When I do this, I get the following error message:

Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: File 'Adresser og distancer\kommune.csv' does not exist or is non-readable. getwd()=='C:/Users/KSAlb/OneDrive/Dokumenter'

I need it to change the name of the files. Here it should insert Albertslund instead of kommune in the file names, perform the operations, write out a CSV file (changing "final_kommune.csv" to "final_Albertslund.csv"), clear the environment and then move on to the next data set, Alleroed.

Albertslund and Alleroed are just examples, there is a total of 98 data sets I need to process.

Rui Barradas · Accepted Answer

Maybe something like the code below can be of help. Untested, since there are no data.

library(parallel)
library(dplyr)
library(data.table)
library(tidyr)

data_func <- function(kommune, inpath = "Adresser og distancer", 
                      turbines, outpath = "D:/Speciale/Analysedata") {
  filename <- paste0(kommune, ".csv")
  filename <- file.path(inpath, filename)
  #Load address and turbine datasets
  distances <- fread(
    file = filename,
    header = TRUE,
    sep = ",",
    colClasses = c("longitude" = "character", "latitude" = "character", "min_distance" = "character", "distance_turbine" = "character", "id_turbine" = "character"),
    encoding = "Latin-1"
  )

  #Some cleaning of the data and construction of new variables

  #write out the dataset
  outfile <- paste0("final_", kommune, ".csv")
  outfile <- file.path(outpath, outfile)
  fwrite(mock_final, file = outfile, row.names = FALSE)
}

cluster = makeCluster(2)
setwd("D:\Speciale")

# Read turbines file just once
turbines <- fread(
  file = "turbines_DK.csv",
  header = TRUE,
  sep=",",
  colClasses = c("lon" = "character", "lat" = "character", "id_turbine" = "character", "total_height" = "character", "location" = "character"),
  encoding = "Latin-1"
)

kommune_vec <- c("Albertslund", "Alleroed")
do.call(rbind, parLapply(cl = cluster, kommune_vec, data_func, turbines = turbines))

Performing the same operations on multiple datasets in R

Answers (1)

Related Questions