Reputation: 580
I have 80 separate .csv files that have the same columns and headers that I was able to import and rbind as one dataframe using the following commands:
file_names <- dir("~/Desktop/data")
df <- do.call(rbind,lapply(file_names,read.csv))
But I would like to add a new variable ("name") that identifies from which .csv file each observation came from. So for example, this variable "name" would be "NY" for all the observations from the 'NY.csv' file and "DC" for all observations from the 'DC.csv' file, etc... Is there any way to do this without adding this new column manually on each .csv? Thanks!
Upvotes: 3
Views: 7396
Reputation: 2043
With readr >= 2.0
just add the id
option:
library(readr)
read_csv(file_names, id = "name")
If you would like to remove the csv
at the end:
read_csv(file_names, id = "name") %>%
mutate(name = str_remove_all(name, ".csv"))
See this thread for more options.
Upvotes: 2
Reputation: 6073
Use the idcol
argument from data.table
's rbindlist()
function:
# get a vector of all file names
myfiles <- list.files("path/to/directory/")
# loop over files names, reading in and saving each data.frame as an element in a list
n <- length(myfiles )
datalist <- vector(mode="list", length=n)
for(i in 1:n) {
cat("importing file", i, ":", myfiles[i], "\n")
datalist[[i]] <- read.csv(myfiles[i])
}
# assign list elements the file names
names(datalist) <- myfiles
# combine all data.frames in datalist, use idcol argument to assign original file name
all_data <- data.table::rbindlist(datalist, idcol=TRUE)
Upvotes: 0
Reputation: 1544
This should do it:
file_names <- dir("~/Desktop/data")
df <- do.call(rbind, lapply(file_names, function(x) cbind(read.csv(x), name=strsplit(x,'\\.')[[1]][1])))
Upvotes: 3