Lamma
Lamma

Reputation: 1507

Loop over all subdirectories and read in a file in each subdirectory

I have an output directory from dbcans with each sample output in a subdirectory. I need to loop over each subdrectory are read into R a file called overview.csv.

for (subdir in list.dirs(recursive = FALSE)){
  data = read.csv(file.path(~\\subdir, "overview.csv"))
}

I am unsure how to deal with the changing filepath in read.csv for each subdir. Any help would be appriciated.

Upvotes: 0

Views: 90

Answers (2)

r2evans
r2evans

Reputation: 160417

Up front, the ~\\subdir (not as a string) is obviously problematic. Since subdir is already a string, using file.path is correct but with just the variable. If you are concerned about relative versus absolute, you can always normalize the paths with normalizePath(list.dirs()), though this does not really change things if you use `

A few things to consider.

  1. Constantly reassigning to the same variable doesn't help, so either you need to assign to an element of a list or something else (e.g., lapply, below). (I also think data as a variable name is problematic. While it works just fine "now", if you ever run part of your script without assigning to data, you will be referencing the function, resulting in possibly confusing errors such as Error in data$a : object of type 'closure' is not subsettable; since a closure is really just a function with its enclosing namespace/environment, this is just saying "you tried to do something to a function".)

  2. I think both pattern= and full.names= might be useful to switch from using list.dirs to list.files, such as

    datalist <- list()
    # I hope recursion doesn't go too deep here
    filelist <- list.files(pattern = "overview.csv", full.names = TRUE, recursive = TRUE)
    for (ind in seq_along(filelist)) {
      datalist[[ind]] <- read.csv(filelist[ind])
    }
    # perhaps combine into one frame
    data1 <- do.call(rbind, datalist)
    
  3. Reading in lots of files and doing them same thing to all of them suggests lapply. This is a little more compact version of number 2:

    filelist <- list.files(pattern = "overview.csv", recursive = TRUE, full.names = TRUE)
    datalist <- lapply(filelist, read.csv)
    data1 <- do.call(rbind, datalist)
    

    Note: if you really only need precisely one level of subdirs, you can work around that with:

    filelist <- list.files(list.dirs(somepath, recursive = FALSE),
                           pattern = "overview.csv", full.names = TRUE)
    

    or you can limit to no more than some depth, perhaps with list.dirs.depth.n from https://stackoverflow.com/a/48300309.

Upvotes: 2

Georgery
Georgery

Reputation: 8117

I think it should be this.

for (subdir in list.dirs(recursive = FALSE)){
    data = read.csv(paste0(subdir, "overview.csv"))
}

Upvotes: 1

Related Questions