Reputation: 97
I have 1 folder consists of more than 100 csv files, each has different column names (15 column names), and different file names. The example would be as below:
[1] "Data/Yahoo_2014.csv" "Data/Yahoo_2015.csv"
[3] "Data/Yahoo_2016.csv" "Data/Yahoo_2017.csv"
[5] "Data/Yahoo_2018.csv" "Data/Yahoo_2019.csv"
[7] "Data/Yahoo_2020.csv" "Data/Google_2014.csv"
[9] "Data/Google_2015.csv" "Data/Google_2016.csv"
etc
Each csv has different column names. Example for Yahoo Data
Date Yahoo
for Google
Date Google
The only thing that is similar is the first column (Date). I want to merge all of this data into one csv file in R so that I can proceed to analyze it. The result should be as below:
Date Yahoo Google
1 2014-01-05 75 50
2 2014-01-12 84 6
3 2014-01-19 81 3
4 2014-01-26 82 35
I already looked at other questions in StackOverflow but found nothing similar. I came up with this solution but it won't work because they have different column names.
data <- read.csv(paste0("Data/","Yahoo_2014.csv"),
skip=2,
na.strings="<1")
allFileNames <- list.files("Data")
All <- data.frame(matrix(, nrow=0, ncol=3))
names(All) <- c("Date","Yahoo","Google")
for (filename in allFileNames) {
fullFilename <- paste0("Data/",filename)
Data <- read.csv(fullFilename,
skip=2,
na.strings="<1")
names(trendsData) <- c("Date","Yahoo","Google")
All <- rbind(All,Data)
}
Upvotes: 1
Views: 526
Reputation: 6483
If this is a script that is run frequently one should really come up with an alternative in order to avoid growing objects:
Assuming there is always a column named Date
in the first column and only two columns in total.
library(dplyr)
library(tidyr)
All <- bind_rows(sapply(allFileNames, function(x) {
Data <- read.csv(filename,
skip=2,
na.strings="<1",
stringsAsFactors=FALSE)
Data$site <- gsub(".*[[:punct:]]([A-z]+)_.*", "\\1", filename)
names(Data) <- c("Date", "values", "site")
return(Data)
})) %>%
pivot_wider(names_from=site,
values_from=values)
All
Older answer:
allFileNames <- list.files("Data", full.names = TRUE)
All <- read.csv(allFileNames[1]),
skip=2,
na.strings="<1",
stringsAsFactors=FALSE)
All$site <- gsub(".*[[:punct:]]([A-z]+)_.*", "\\1", allFileNames[1)
names(All) <- c("Date", "values", "site")
for (filename in allFileNames[-1]) {
Data <- read.csv(filename,
skip=2,
na.strings="<1",
stringsAsFactors=FALSE)
Data$site <- gsub(".*[[:punct:]]([A-z]+)_.*", "\\1", filename)
names(Data) <- c("Date", "values", "site")
All <- rbind(All, Data)
}
library(dplyr)
library(tidyr)
All <- All %>%
pivot_wider(names_from=site,
values_from=values)
All
Upvotes: 1