Reputation: 287
I have a directory of thousands of CSV files which, fortunately, follow a strict naming convention. I am trying to write a function that groups into separate data frames all of the files that end with the same last 7 digits.
I have a vector (u) of the 7 digit patterns to match:
v <- list.files(wd, full.names = FALSE)
u <- unique(substr(v, 9, 15))
Now I need to run each element of vector u against each file in list v, and combine all the matching files in v into a single data data frame for each value of u.
I've tried a few things with no success:
#only matches first in list
files <- list.files(pattern=u)
#makes a list of vectors with the same contents
lapply(v, function(x) list.files(pattern=u))
#nope
data <- data.frame()
for (i in 1:length(u)) {
data <- rbind(data, read.csv(v[files]))
}
A nudge or shove in the in the right direction would be greatly appreciated.
Thanks!
Upvotes: 0
Views: 75
Reputation: 12155
Nested calls to lapply
should do it. The first call to lapply
loops through the unique patterns (v
). For each pattern, the second lapply
loops through all matching files (list.files(pattern=pattern)
), read the files in (read.table
) and then bind them together into a single data.frame with bind_rows
from the dplyr
package (you can also use rbind
, but I find bind_rows
simpler) and return that to the outer lapply
.
The result should be a list
of data.frames
, each of which contains the merged contents of all .csv
files that matched a 7 digit pattern.
list_of_file_sets <- lapply(v, function(pattern) {
file_set <- lapply(list.files(pattern=pattern), function(file) {
read.table(file, sep=',', header=T, stringsAsFactors=F)
})
file_set <- dplyr::bind_rows(file_set)
})
names(list_of_file_sets) <- v # Optionally set names of list to 7 digit pattern
Upvotes: 2