cookie monster
cookie monster

Reputation: 51

Pattern matching using lapply in R

In R I have rainfall data from up to 44 different sites. The subfolders with the rainfall data look like this:

"data_in/1995", "data_in/1996", "data_in/2019"

The first 5 files in the first folder are as follows: "S01Y95.out" "S02Y95.out" "S03Y95.out" "S04Y95.out" "S05Y95.out"

My R code creates a list of the subfolder names, and applies a function which looks for the files with "S01", which is Site No. 1. The lapply function is applied to a list which is the subfolder names. The code then copies the files which match the "S01" pattern to another folder, for further analysis. This is all working fine, using the following code (sorry - I have not yet figured out how to use dput and what have you to create a nice looking reproducible example):

# set path to the directory with the raw data
data_in_path<-'data_in'

# get the names of the subfolders
subfolder_names<-dir(data_in_path,full.names = TRUE)
subfolder_names<-as.list(subfolder_names)

S01_raw_path<-'Site_Data/S01_Raw'

# function to search subdirectories for matching files
p_match_Function <- function(files){

  # reference the full path and the input file name using file.path
  file_match<-list.files(files,pattern = "S01")

  # copy the matching file to site subdirectory
  file.copy(file.path(files,file_match), S01_raw_path)
}

# use lapply to run p_match function on each subfolder
lapply(subfolder_names, function(x) p_match_Function(x))

The above code works, but notice that I specify the pattern match ("S01") inside the function. This works, but this is not far from ideal, because I would rather pass the site name, e.g. "S01", "S02" to the function using a list. The alternative would be to call the p_match function around 48 times, each time manually specifying the pattern match (site name) as "S01", "S02", etc.

I tried a for loop as follows but it gives me an error.

# strip the site names from the raw file names in the 2019 folder
site_names<-strtrim(dir('data_in/2019'),3)

# loop through the site names and pass the site name to the p_match_function

for (i in seq_along(site_names)) {
  name<-site_names[i]
  lapply(subfolder_names, function(x,name) p_match_Function(x,name))
}  

This gives an error "Error in p_match_Function(x, name) : unused argument (name)".

I am really stuck here. I suspect the way I call lapply in the second example is just plain wrong. Should I forget about using lapply and go back to using nested loops ? This violates the functional programming approach which is a real strength using R. Thoughts ?

[1]

Another user replied with a suggestion, but I still got an error. I found an answer in another post "passing several arguments to FUN of lapply (and others * apply). So in the end I just had to modify the lapply call as follows and solved the problem:

# loop through the site names and pass the site name to the p_match_function

for (i in seq_along(site_names)) {

  name<-site_names[i]
#  lapply(subfolder_names, function(x,name) p_match_Function(x,name))
  lapply(subfolder_names, p_match_Function, name)

}  

Upvotes: 0

Views: 262

Answers (2)

cookie monster
cookie monster

Reputation: 51

OK, here is the code that did not work, because the function call was not correct:

# function to search subdirectories for matching files
p_match_Function <- function(files,name) {

  # find a file that matches pattern = name
  file_match<-list.files(files,pattern = name)

  # copy the matching file to site subdirectory
  file.copy(file.path(files,file_match), S01_raw_path)
}
#
# strip the site names from the raw file names in the 1995 folder
site_names<-strtrim(dir('data_in/1995'),3)

# loop through the site names and pass the site name to the p_match_function

for (i in seq_along(site_names)) {

  name<-site_names[i]
  lapply(subfolder_names, function(x,name) p_match_Function(x,name))

}  

On the other hand, here is the code that did work. Not only the line with lapply was changed:

# function to search subdirectories for matching files
p_match_Function <- function(files,name) {

  # find a file that matches pattern = name
  file_match<-list.files(files,pattern = name)

  # copy the matching file to site subdirectory
  file.copy(file.path(files,file_match), S01_raw_path)
}
#
# strip the site names from the raw file names in the 1995 folder
site_names<-strtrim(dir('data_in/1995'),3)

# loop through the site names and pass the site name to the p_match_function

for (i in seq_along(site_names)) {

  name<-site_names[i]
  lapply(subfolder_names, p_match_Function, name)  
}  

Upvotes: 0

Waldi
Waldi

Reputation: 41220

The error code tells you that p_match_Functiondoesn't accept argument name, which is the case because right now it takes only filesas argument. You should include name in the arguments :

p_match_Function <- function(files,name){

  # reference the full path and the input file name using file.path
  file_match<-list.files(files,pattern = name)

  # copy the matching file to site subdirectory
  file.copy(file.path(files,file_match), S01_raw_path)
}

Upvotes: 0

Related Questions