Reputation: 31
I am a newbie to R but I have been given a large dataset to work with and I am wondering whether I can do this process in R instead of manually. I have a folder with about 600 hundred csv files with two columns and no headers. Each csv file has the same first column but a different second column, I would like to combine them in the form detailed below:
File1.csv
1|A
2|A
3|A
4|A
File2.csv
1|B
2|B
3|B
4|B
And I would like to combine them to:
ID|File 1|File 2
1|A|B
2|A|B
3|A|B
4|A|B
the current code I have is:
library("dplyr")
library("plyr")
library("readr")
data <- list.files(pattern = "*.csv", full.names = TRUE) %>%
lapply(read_csv) %>%
bind_rows
data
This works to an extent, however, since the data does not have headings, it puts the first data value as the column heading like this.
1|A|B
2|A|B
3|A|B
4|A|B
Which will result in me manually having to input headers. I would be grateful for any help!
Upvotes: 3
Views: 1727
Reputation: 393
It sounds like you want to join your data rather than unite it. Does the following solve your problem?
library(tidyverse)
my_read <- function(x) {
tmp <- read_csv(x, col_names = FALSE)
names(tmp) <- c("id", x)
tmp
}
data <- list.files(pattern = "*.csv", full.names = TRUE) %>%
map(my_read) %>%
reduce(left_join)
First you can create a small function to read your data in without headers and assign the file names as the 2nd column name.
reduce()
comes from {purrr}
(as does map()
), and will iteratively apply a function to elements in a list. In this case, we'll keep left_join()
'ing our data together (on whatever your common 1st column is) until we get a single dataframe.
Upvotes: 2