Reputation: 1023
I have a series of csv files (one per anum) with the same column headers and different number of rows. Originally I was reading them in and merging them like so;
setwd <- ("N:/Ring data by cruise/Shetland")
LengthHeight2013 <- read.csv("N:/Ring data by cruise/Shetland/R_0113A_S2013_WD.csv",sep=",",header=TRUE)
LengthHeight2012 <- read.csv("N:/Ring data by cruise/Shetland/R_0212A_S2012_WD.csv",sep=",",header=TRUE)
LengthHeight2011 <- read.csv("N:/Ring data by cruise/Shetland/R_0211A_S2011_WOD.csv",sep=",",header=TRUE)
LengthHeight2010 <- read.csv("N:/Ring data by cruise/Shetland/R_0310A_S2010_WOD.csv",sep=",",header=TRUE)
LengthHeight2009 <- read.csv("N:/Ring data by cruise/Shetland/R_0309A_S2009_WOD.csv",sep=",",header=TRUE)
LengthHeight <- merge(LengthHeight2013,LengthHeight2012,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2011,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2010,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2009,all=TRUE)
I would like to know if there is a shorter/tidier way to do this, also considering that each time I run the script I might want to look at a different range of years.
I also found this bit of code by Tony Cookson which looks like it would do what I want, however the data frame it produces for me has only the correct headers but no data rows.
multmerge = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
Reduce(function(x,y) {merge(x,y)}, datalist)
mymergeddata = multmerge("C://R//mergeme")
Upvotes: 12
Views: 37594
Reputation: 21
You could try it in a tidy way like this:
files <- list.files('the path of you files',
pattern = ".csv$", recursive = TRUE, full.names = TRUE)
myMergedData <- read_csv(files) %>% bind_rows()
Upvotes: 2
Reputation: 56259
Find files (list.files
) and read the files in a loop (lapply
), then call (do.call
) row bind (rbind
) to put all files together by rows.
myMergedData <-
do.call(rbind,
lapply(list.files(path = "N:/Ring data by cruise"), read.csv))
Update: There is a vroom package, according to the manuals it is much faster than data.table::fread and base read.csv. The syntax looks neat, too:
library(vroom)
myMergedData <- vroom(files)
Upvotes: 26
Reputation: 9
Don't have enough rep to comment, but to answer Rafael Santos, you can use the code here to add params to the lapply in the answer above. Using lapply and read.csv on multiple files (in R)
Upvotes: 0
Reputation: 118889
If you're looking for speed, then try this:
require(data.table) ## 1.9.2 or 1.9.3
ans = rbindlist(lapply(filenames, fread))
Upvotes: 15