Reputation: 229
I'm attempting to merge multiple csv files using R. all of the CSV files have the same fields and are all a shared folder only containing these CSV files. I've attempted to do it using the following code:
multmerge=function(mypath) {
filenames=list.files(path=mypath, full.names=TRUE)
datalist= lapply(filenames, function (x) read.csv(file=x, header=True))
Reduce(function(x,y) merge(x,y), datalist)}
I am entering my path as something like "Y:/R Practice/specdata". I do get an ouput when I apply the function to my 300 or so csv files, but the result gives me my columns names, but beneath it has <0 rows> (or 0-length row.names). Please let me know if you have any suggestions on why this isn't working and how I can fix it.
Upvotes: 19
Views: 95252
Reputation: 3285
The following code is similar to the answer by @Michael but uses only base R functions; rbindlist is replaced.
multmerge <- function(path) {
files <- list.files(path, pattern="csv")
lst <- lapply(files, function(x){ read.csv(x, header=TRUE, stringsAsFactors=FALSE) })
do.call("rbind", lst)
}
Upvotes: 0
Reputation: 11
i used the rbind() function. just separate the csv files you want to merge by commas.
after reading all of the csv files separately:
read_csv("name_of_file.csv")
you can then use rbind() to concatenate them into a single csv file.
all_of_the_marbles <- rbind("name_of_file.csv", "name_of_file2.csv", "name_of_file4.csv")
Upvotes: 0
Reputation: 119
Let me give you the best I have ever had:
library(pacman)
p_load(doParallel,data.table,stringr)
# get the file name
dir() %>% str_subset("\\.csv$") -> fn
# use parallel setting
(cl = detectCores() %>%
makeCluster()) %>%
registerDoParallel()
# read and bind
system.time({
big_df = foreach(i = fn,
.packages = "data.table") %dopar% {
fread(i,colClasses = "chracter")
} %>%
rbindlist(fill = T)
})
# end of parallel work
stopImplicitCluster(cl)
This should be faster as long as you have more cores in your computer.If you are dealing with big data, it is preferred.
Upvotes: 1
Reputation: 316
For anyone who has many csvs with the same header (I had ~1000), here is a quicker approach, which avoids parsing the csvs individually.
filenames <- list.files(path=mypath, full.names=TRUE)
#read the files in as plaintext
csv_list <- lapply(filenames , readLines)
#remove the header from all but the first file
csv_list[-1] <- sapply(csv_list[-1], "[", 2)
#unlist to create a character vector
csv_list <- unlist(csv_list)
#write the csv as one single file
writeLines(text=csv_list,
con="all_my_csvs_combined.csv")
)
#read the csv as one single file
all_my_csvs_combined <- read.csv("all_my_csvs_combined.csv")
Upvotes: 2
Reputation: 51
To combine many CSV files:
library(plyr)
veri <- ldply(list.files(), read.csv, header=TRUE, sep='\t')
View(veri)
Upvotes: 0
Reputation: 32986
For a shorter, faster solution
library(dplyr)
library(readr)
df <- list.files(path="yourpath", full.names = TRUE) %>%
lapply(read_csv) %>%
bind_rows
Upvotes: 46
Reputation: 21
I tried working with the same function but included the all=TRUE
in the merge function and worked just fine.
The code I used is as follows:
multmerge = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
Reduce(function(x,y) {merge(x,y,all = TRUE)}, datalist)
}
full_data = multmerge("path_name for your csv folder")
Hope this helps. Cheers!
Upvotes: 2
Reputation: 2556
Another option that has proved to work for my setup:
multmerge = function(path){
filenames=list.files(path=path, full.names=TRUE)
rbindlist(lapply(filenames, fread))
}
path <- "Dropbox/rstudio-share/dataset/MB"
DF <- multmerge(path)
If you need a much granular control of your CSV file during the loading process you can change the fread
by a function like so:
multmerge = function(path){
filenames=list.files(path=path, full.names=TRUE)
rbindlist(lapply(filenames, function(x){read.csv(x, stringsAsFactors = F, sep=';')}))
}
Upvotes: 4
Reputation: 53
If all your csv files have exactly the same fields (column names) and you want simply to combine them vertically, you should use rbind
instead of merge
:
> a
A B
[1,] 2.471202 38.949232
[2,] 16.935362 6.343694
> b
A B
[1,] 0.704630 0.1132538
[2,] 4.477572 11.8869057
> rbind(a, b)
A B
[1,] 2.471202 38.9492316
[2,] 16.935362 6.3436939
[3,] 0.704630 0.1132538
[4,] 4.477572 11.8869057
Upvotes: 2
Reputation: 601
Your code worked for me, but you need change header = True
to header = TRUE
.
Upvotes: 3