Phoebe
Phoebe

Reputation: 287

Combining CSV files which have similar names

I have a directory of thousands of CSV files which, fortunately, follow a strict naming convention. I am trying to write a function that groups into separate data frames all of the files that end with the same last 7 digits.

I have a vector (u) of the 7 digit patterns to match:

v <- list.files(wd, full.names = FALSE)
u <- unique(substr(v, 9, 15))

Now I need to run each element of vector u against each file in list v, and combine all the matching files in v into a single data data frame for each value of u.

I've tried a few things with no success:

#only matches first in list
files <- list.files(pattern=u)

#makes a list of vectors with the same contents
lapply(v, function(x) list.files(pattern=u)) 

#nope
data <- data.frame()
  for (i in 1:length(u)) {
    data <- rbind(data, read.csv(v[files]))
    }

A nudge or shove in the in the right direction would be greatly appreciated.

Thanks!

Upvotes: 0

Views: 75

Answers (1)

divibisan
divibisan

Reputation: 12155

Nested calls to lapply should do it. The first call to lapply loops through the unique patterns (v). For each pattern, the second lapply loops through all matching files (list.files(pattern=pattern)), read the files in (read.table) and then bind them together into a single data.frame with bind_rows from the dplyr package (you can also use rbind, but I find bind_rows simpler) and return that to the outer lapply.

The result should be a list of data.frames, each of which contains the merged contents of all .csv files that matched a 7 digit pattern.

list_of_file_sets <- lapply(v, function(pattern) {
    file_set <- lapply(list.files(pattern=pattern), function(file) {
        read.table(file, sep=',', header=T, stringsAsFactors=F)
    })
    file_set <- dplyr::bind_rows(file_set)
})
names(list_of_file_sets) <- v # Optionally set names of list to 7 digit pattern

Upvotes: 2

Related Questions