Reputation: 1546
I have list of csv files. I could read them all using read_csv
.
But I would like to add the filename as identifier. How do I do it ?
library(tidyverse)
# read file names
csv_filenames <- list.files(path = "OMITTED FOR THIS EXAMPLE",
full.names = TRUE)
###
csv_filenames are "One.csv", "Two.csv", "Three.csv", ....
###
# read csv files
df <- read_csv(csv_filenames)
Upvotes: 0
Views: 681
Reputation: 72838
Just cbind
the basename
s, actually.
lapply(csv_filenames, \(x) cbind(read.csv(x), file=basename(x))) |> do.call(what=rbind)
# X1 X2 X3 X4 file
# 1 1 4 7 10 file1.csv
# 2 2 5 8 11 file1.csv
# 3 3 6 9 12 file1.csv
# 4 1 4 7 10 file2.csv
# 5 2 5 8 11 file2.csv
# 6 3 6 9 12 file2.csv
# 7 1 4 7 10 file3.csv
# 8 2 5 8 11 file3.csv
# 9 3 6 9 12 file3.csv
Data:
dat <- data.frame(matrix(1:12, 3, 4))
path <- '/what/so/ever'
lapply(1:3, \(i) write.csv(dat, paste0(path, '/tmp/file', i, '.csv'), row.names=F))
csv_filenames <- list.files(paste0(path, "/tmp"), full.names=TRUE)
Upvotes: 0
Reputation: 61154
With R base
csv_files <- lapply(csv_filenames, read.csv)
file_names <- sub("\\..*", "", basename(csv_filenames))
out <- lapply(1:length(csv_files), function(i){
transform(csv_files[[i]], file_name = file_names[i])
})
do.call(rbind, out)
Upvotes: 1
Reputation: 33782
read_csv
has an argument id =
; if you specify "path", you get a column named "path" with the file names:
csv_data <- read_csv(csv_filenames, id = "path")
If you wanted just the base file name, you could add a dplyr::mutate
step:
library(dplyr)
csv_data <- read_csv(csv_filenames, id = "path") %>%
mutate(path = basename(path))
Upvotes: 1
Reputation: 5719
library(dplyr)
# list of file names
file_list <- list.files(path = "path/to/csv/files", pattern = "*.csv")
# read in all files and add the file name as an additional column
data_list <- lapply(file_list, function(x) {
data <- read.csv(file = x, stringsAsFactors = FALSE) %>%
mutate(file_name = x)
return(data)
})
Upvotes: 1
Reputation: 17240
You should be able to use assign
with basename
in a for
loop.
for(i in seq_along(csv_filenames)){
assign(basename(csv_filenames)[i], read.csv(csv_filenames[i]))
}
Using basename
will assign a new object in the global environment with the name of the file in the folder (not the whole file path obtained with full.names = TRUE
).
Upvotes: 1