aua
aua

Reputation: 97

Combine Multiple CSV into One Data Frame

I have 1 folder consists of more than 100 csv files, each has different column names (15 column names), and different file names. The example would be as below:

[1] "Data/Yahoo_2014.csv"   "Data/Yahoo_2015.csv"  
[3] "Data/Yahoo_2016.csv"   "Data/Yahoo_2017.csv"  
[5] "Data/Yahoo_2018.csv"   "Data/Yahoo_2019.csv"  
[7] "Data/Yahoo_2020.csv"   "Data/Google_2014.csv"
[9] "Data/Google_2015.csv"  "Data/Google_2016.csv"

etc

Each csv has different column names. Example for Yahoo Data

Date Yahoo

for Google

Date Google

The only thing that is similar is the first column (Date). I want to merge all of this data into one csv file in R so that I can proceed to analyze it. The result should be as below:

Date Yahoo Google
1   2014-01-05  75  50
2   2014-01-12  84  6
3   2014-01-19  81  3
4   2014-01-26  82  35

I already looked at other questions in StackOverflow but found nothing similar. I came up with this solution but it won't work because they have different column names.

data <- read.csv(paste0("Data/","Yahoo_2014.csv"),
                       skip=2, 
                       na.strings="<1")

allFileNames <- list.files("Data")
All <- data.frame(matrix(, nrow=0, ncol=3))
names(All) <- c("Date","Yahoo","Google")
for (filename in allFileNames) {
  fullFilename <- paste0("Data/",filename)
  Data <- read.csv(fullFilename,
                         skip=2, 
                         na.strings="<1")
  names(trendsData) <- c("Date","Yahoo","Google")
  All <- rbind(All,Data)
}

Upvotes: 1

Views: 526

Answers (1)

dario
dario

Reputation: 6483

Edit

If this is a script that is run frequently one should really come up with an alternative in order to avoid growing objects:

Assuming there is always a column named Date in the first column and only two columns in total.

library(dplyr)
library(tidyr)
All <- bind_rows(sapply(allFileNames, function(x) {
    Data <- read.csv(filename,
               skip=2, 
               na.strings="<1",
               stringsAsFactors=FALSE)

    Data$site <- gsub(".*[[:punct:]]([A-z]+)_.*", "\\1", filename)
    names(Data) <- c("Date", "values", "site")
    return(Data)
})) %>%
  pivot_wider(names_from=site,
            values_from=values)

All

Older answer:

allFileNames <- list.files("Data", full.names = TRUE)
All <- read.csv(allFileNames[1]),
                skip=2, 
                na.strings="<1",
                stringsAsFactors=FALSE)
All$site <- gsub(".*[[:punct:]]([A-z]+)_.*", "\\1", allFileNames[1)
names(All) <- c("Date", "values", "site")

for (filename in allFileNames[-1]) {
  Data <- read.csv(filename,
                   skip=2, 
                   na.strings="<1",
                   stringsAsFactors=FALSE)
  
  Data$site <- gsub(".*[[:punct:]]([A-z]+)_.*", "\\1", filename)
  names(Data) <- c("Date", "values", "site")

  All <- rbind(All, Data)
}

library(dplyr)
library(tidyr)
All <- All %>%
  pivot_wider(names_from=site,
              values_from=values)
All

Upvotes: 1

Related Questions