Agustín Indaco
Agustín Indaco

Reputation: 580

Importing multiple .csv files into R and adding a new column with file name

I have 80 separate .csv files that have the same columns and headers that I was able to import and rbind as one dataframe using the following commands:

 file_names <- dir("~/Desktop/data") 
 df <- do.call(rbind,lapply(file_names,read.csv))

But I would like to add a new variable ("name") that identifies from which .csv file each observation came from. So for example, this variable "name" would be "NY" for all the observations from the 'NY.csv' file and "DC" for all observations from the 'DC.csv' file, etc... Is there any way to do this without adding this new column manually on each .csv? Thanks!

Upvotes: 3

Views: 7396

Answers (3)

Rodrigo Zepeda
Rodrigo Zepeda

Reputation: 2043

With readr >= 2.0 just add the id option:

library(readr)
read_csv(file_names, id = "name")

If you would like to remove the csv at the end:

read_csv(file_names, id = "name") %>%
   mutate(name = str_remove_all(name, ".csv"))

See this thread for more options.

Upvotes: 2

DanY
DanY

Reputation: 6073

Use the idcol argument from data.table's rbindlist() function:

# get a vector of all file names
myfiles <- list.files("path/to/directory/")

# loop over files names, reading in and saving each data.frame as an element in a list
n <- length(myfiles )
datalist <- vector(mode="list", length=n)
for(i in 1:n) {
    cat("importing file", i, ":", myfiles[i], "\n")
    datalist[[i]] <- read.csv(myfiles[i])
}

# assign list elements the file names
names(datalist) <- myfiles 

# combine all data.frames in datalist, use idcol argument to assign original file name
all_data <- data.table::rbindlist(datalist, idcol=TRUE)

Upvotes: 0

mpjdem
mpjdem

Reputation: 1544

This should do it:

file_names <- dir("~/Desktop/data") 
df <- do.call(rbind, lapply(file_names, function(x) cbind(read.csv(x), name=strsplit(x,'\\.')[[1]][1])))

Upvotes: 3

Related Questions