Nicole
Nicole

Reputation: 13

dplyr - Need to add the CSV filename after importing all files from a directory

I have used this code to import all csv files from a directory

orgs <-
  list.files(pattern = "\\.csv$") %>% 
  map_df(~read_csv(., col_types = cols(.default = "c")))

Which has successfully combined all files in the directory into one data frame

I am looking for a way to add the file names of the csv files imported as an addtional variable

Looking something like

Imported Data Filename
data 1 csv1.csv
data 2 csv2.csv

Upvotes: 1

Views: 53

Answers (3)

zephryl
zephryl

Reputation: 17204

First use purrr::set_names(), then use the .id argument in map_dfr() to assign the names to a column:

library(purrr)

orgs <- list.files(pattern = "\\.csv$") %>% 
  set_names() %>%
  map_dfr(
    read_csv,
    col_types = cols(.default = "c"),
    .id = "Filename"
  )

orgs
# # A tibble: 4 × 2
#   Filename x    
#   <chr>    <chr>
# 1 dat1.csv 1    
# 2 dat1.csv 2    
# 3 dat2.csv 3    
# 4 dat2.csv 4    

Sample data:

write.csv(data.frame(x = 1:2), "dat1.csv", row.names=FALSE)
write.csv(data.frame(x = 3:4), "dat2.csv", row.names=FALSE)

Upvotes: 1

Edward
Edward

Reputation: 19339

Something like this?

lapply(list.files(pattern = "\\.csv$"), \(x) {
  read_csv(x, col_types = cols(.default = "c")) |>
  mutate(Filename=x)  }) |>
  bind_rows()

Returns (for the data generated below):

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Filename 
   <chr>        <chr>       <chr>        <chr>       <chr>   <chr>    
 1 5.1          3.5         1.4          0.2         setosa  iris1.csv
 2 4.9          3           1.4          0.2         setosa  iris1.csv
 3 4.7          3.2         1.3          0.2         setosa  iris1.csv
 4 4.6          3.1         1.5          0.2         setosa  iris1.csv
 5 5            3.6         1.4          0.2         setosa  iris1.csv
 6 5.4          3.9         1.7          0.4         setosa  iris2.csv
 7 4.6          3.4         1.4          0.3         setosa  iris2.csv
 8 5            3.4         1.5          0.2         setosa  iris2.csv
 9 4.4          2.9         1.4          0.2         setosa  iris2.csv
10 4.9          3.1         1.5          0.1         setosa  iris2.csv

Note that map_df was superseded in purrr 1.0.0.


Data:

library(dplyr)
library(readr)

write.csv(iris[1:5,], "iris1.csv", row.names=FALSE)
write.csv(iris[6:10,], "iris2.csv", row.names=FALSE)

Upvotes: 1

Parfait
Parfait

Reputation: 107737

Consider the functional form of map_dfr instead of formula form to use file name parameter twice with mutate:

orgs <-
  list.files(pattern = "\\.csv$") %>% 
  map_dfr(\(f) {
    read_csv(f, col_types = cols(.default = "c")) %>% 
      mutate(file_name = f)
  })

Upvotes: 0

Related Questions