Cole
Cole

Reputation: 229

Trying to merge multiple csv files in R

I'm attempting to merge multiple csv files using R. all of the CSV files have the same fields and are all a shared folder only containing these CSV files. I've attempted to do it using the following code:

multmerge=function(mypath) {
    filenames=list.files(path=mypath, full.names=TRUE)
    datalist= lapply(filenames, function (x) read.csv(file=x, header=True))
    Reduce(function(x,y) merge(x,y), datalist)}

I am entering my path as something like "Y:/R Practice/specdata". I do get an ouput when I apply the function to my 300 or so csv files, but the result gives me my columns names, but beneath it has <0 rows> (or 0-length row.names). Please let me know if you have any suggestions on why this isn't working and how I can fix it.

Upvotes: 19

Views: 95252

Answers (10)

cannin
cannin

Reputation: 3285

The following code is similar to the answer by @Michael but uses only base R functions; rbindlist is replaced.

multmerge <- function(path) {
  files <- list.files(path, pattern="csv")
  lst <- lapply(files, function(x){ read.csv(x, header=TRUE, stringsAsFactors=FALSE) })
  do.call("rbind", lst)
}

Upvotes: 0

Sanguine-Otter
Sanguine-Otter

Reputation: 11

i used the rbind() function. just separate the csv files you want to merge by commas.

after reading all of the csv files separately:

read_csv("name_of_file.csv")

you can then use rbind() to concatenate them into a single csv file.

all_of_the_marbles <- rbind("name_of_file.csv", "name_of_file2.csv", "name_of_file4.csv")

Upvotes: 0

Hope
Hope

Reputation: 119

Let me give you the best I have ever had:

library(pacman)
p_load(doParallel,data.table,stringr)

# get the file name
dir() %>% str_subset("\\.csv$") -> fn

# use parallel setting
(cl = detectCores() %>% 
  makeCluster()) %>% 
  registerDoParallel()

# read and bind
system.time({
  big_df = foreach(i = fn,
                    .packages = "data.table") %dopar% {
                      fread(i,colClasses = "chracter")
                    } %>% 
    rbindlist(fill = T)
})

# end of parallel work
stopImplicitCluster(cl)

This should be faster as long as you have more cores in your computer.If you are dealing with big data, it is preferred.

Upvotes: 1

user160623
user160623

Reputation: 316

For anyone who has many csvs with the same header (I had ~1000), here is a quicker approach, which avoids parsing the csvs individually.

filenames <- list.files(path=mypath, full.names=TRUE)

#read the files in as plaintext
csv_list <- lapply(filenames , readLines)

#remove the header from all but the first file
csv_list[-1] <- sapply(csv_list[-1], "[", 2)

#unlist to create a character vector
csv_list <- unlist(csv_list)

#write the csv as one single file
writeLines(text=csv_list,
           con="all_my_csvs_combined.csv")
           )

#read the csv as one single file
all_my_csvs_combined <- read.csv("all_my_csvs_combined.csv")

Upvotes: 2

OzcanK
OzcanK

Reputation: 51

To combine many CSV files:

  1. Set your file location for R programming.
  2. Run:
    library(plyr)
    veri <- ldply(list.files(), read.csv, header=TRUE, sep='\t')
    View(veri)
    

Upvotes: 0

Maiasaura
Maiasaura

Reputation: 32986

For a shorter, faster solution

library(dplyr)
library(readr)
df <- list.files(path="yourpath", full.names = TRUE) %>% 
  lapply(read_csv) %>% 
  bind_rows 

Upvotes: 46

Mustapha Wasseja
Mustapha Wasseja

Reputation: 21

I tried working with the same function but included the all=TRUE in the merge function and worked just fine.

The code I used is as follows:

multmerge = function(mypath){
  filenames=list.files(path=mypath, full.names=TRUE)
  datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
  Reduce(function(x,y) {merge(x,y,all = TRUE)}, datalist)
}

full_data = multmerge("path_name for your csv folder")

Hope this helps. Cheers!

Upvotes: 2

Michael
Michael

Reputation: 2556

Another option that has proved to work for my setup:

multmerge = function(path){
  filenames=list.files(path=path, full.names=TRUE)
  rbindlist(lapply(filenames, fread))
}


path <- "Dropbox/rstudio-share/dataset/MB"
DF <- multmerge(path)

If you need a much granular control of your CSV file during the loading process you can change the fread by a function like so:

multmerge = function(path){
  filenames=list.files(path=path, full.names=TRUE)
  rbindlist(lapply(filenames, function(x){read.csv(x, stringsAsFactors = F, sep=';')}))
}

Upvotes: 4

wctjerry
wctjerry

Reputation: 53

If all your csv files have exactly the same fields (column names) and you want simply to combine them vertically, you should use rbind instead of merge:

> a
             A         B
[1,]  2.471202 38.949232
[2,] 16.935362  6.343694
> b
            A          B
[1,] 0.704630  0.1132538
[2,] 4.477572 11.8869057
> rbind(a, b)
             A          B
[1,]  2.471202 38.9492316
[2,] 16.935362  6.3436939
[3,]  0.704630  0.1132538
[4,]  4.477572 11.8869057

Upvotes: 2

Matt Upson
Matt Upson

Reputation: 601

Your code worked for me, but you need change header = True to header = TRUE.

Upvotes: 3

Related Questions