Reputation: 21
I have 332 csv files and each file has the same number of variables and the same format, and I need to create a function that every time the user calls it, can specify the folder where the csv files are located and the id of the csv files they want to store in one data frame.
The name of the files follows the next format: 001.csv, 002.csv ... 332.csv.
data <- function(directory, id_default = 1:332){
setwd(paste0("/Users/", directory))
id <- id_default
for(i in length(id)){
if(i < 10){
aux <- paste0("00",i)
filename <- paste0(aux,".csv")
}else if(i < 100){
aux <- paste0("0", i)
filename <- paste0(aux, ".csv")
}else if(i >= 100){
filename <- paste0(i, ".csv")
}
my_dataframe <- do.call(rbind, lapply(filename, read.csv))
}
my_dataframe #Print dataframe
}
But the problem is that it only store the last csv file, it seems that every time that enters the loop it overwrites the dataframe with the last csv file. How do I fix it? Plz help
Upvotes: 1
Views: 169
Reputation: 192
A tidy solution will use purrr
(better than a loop for this task): https://purrr.tidyverse.org/reference/map.html
library(tidyverse)
directory <- "directory"
id <- c(1,20,300)
# add leading 0s with stringr's str_pad
id %<>% str_pad(3, pad = "0")
It is best to avoid using setwd()
like this.
Instead, add directory
to the file paths.
paths <- str_c(directory, "/", id, ".csv")
# map files to that function (similar to a loop) and stack rows
map_dfr(paths, read_csv)
Even better, use here()
--it makes file paths work: https://github.com/jennybc/here_here
paths <- str_c(
here::here(directory, id),
".csv")
# map files to that function (similar to a loop) and stack rows
map_dfr(paths, read_csv)
Your example seems to want to make the default id's 1:332. If we wanted all files in the directory, we could use paths <- list.files(here::here(directory))
.
read_my_data <- function(directory, id = 1:332){
paths <- str_c(
here::here(directory, str_pad(id, 3, pad = "0")),
".csv")
map_dfr(paths, read_csv)
}
read_my_data("directory")
If you need to combine files from multiple directories in parallel, you can use pmap_dfr()
Upvotes: 0
Reputation: 887991
Here, we are looping over the last 'id', i.e the length
. Instead it should be
for(i in 1:length(id))
Or more correctly
for(i in seq_along(id))
In addition to the issue with looping, the if/else if
is not really needed. We could use sprintf
filenames <- sprintf('%03d.csv', id)
i.e.
data <- function(directory, id_default = 1:332){
setwd(paste0("/Users/", directory))
filenames <- sprintf('%03d.csv', id_default)
do.call(rbind, lapply(filenames, read.csv))
}
Upvotes: 2