Code.vc
Code.vc

Reputation: 21

Add multiple dataframes to one dataframe without overwriting the existing dataframe in R

I have 332 csv files and each file has the same number of variables and the same format, and I need to create a function that every time the user calls it, can specify the folder where the csv files are located and the id of the csv files they want to store in one data frame.

The name of the files follows the next format: 001.csv, 002.csv ... 332.csv.

data <- function(directory, id_default = 1:332){
setwd(paste0("/Users/", directory))

id <- id_default

for(i in length(id)){
    if(i < 10){
        aux <- paste0("00",i)
        filename <- paste0(aux,".csv")
    }else if(i < 100){
        aux <- paste0("0", i)
        filename <- paste0(aux, ".csv")
    }else if(i >= 100){
        filename <- paste0(i, ".csv")
    }
    my_dataframe <- do.call(rbind, lapply(filename, read.csv))

}
my_dataframe #Print dataframe

}

But the problem is that it only store the last csv file, it seems that every time that enters the loop it overwrites the dataframe with the last csv file. How do I fix it? Plz help

Upvotes: 1

Views: 169

Answers (2)

Devin Judge-Lord
Devin Judge-Lord

Reputation: 192

A tidy solution will use purrr (better than a loop for this task): https://purrr.tidyverse.org/reference/map.html

library(tidyverse)
directory <- "directory"
id <- c(1,20,300)

# add leading 0s with stringr's str_pad
id %<>% str_pad(3, pad = "0")

It is best to avoid using setwd() like this.

Instead, add directory to the file paths.

paths <- str_c(directory, "/", id, ".csv")

# map files to that function (similar to a loop) and stack rows
map_dfr(paths, read_csv)

Even better, use here()--it makes file paths work: https://github.com/jennybc/here_here

paths <- str_c(
      here::here(directory, id),
      ".csv")

# map files to that function (similar to a loop) and stack rows
map_dfr(paths, read_csv)

Your example seems to want to make the default id's 1:332. If we wanted all files in the directory, we could use paths <- list.files(here::here(directory)).

read_my_data <- function(directory, id = 1:332){
  paths <- str_c(
      here::here(directory, str_pad(id, 3, pad = "0")),
      ".csv")
  map_dfr(paths, read_csv)
} 

read_my_data("directory")

If you need to combine files from multiple directories in parallel, you can use pmap_dfr()

Upvotes: 0

akrun
akrun

Reputation: 887991

Here, we are looping over the last 'id', i.e the length. Instead it should be

for(i in 1:length(id))

Or more correctly

for(i in seq_along(id))

In addition to the issue with looping, the if/else if is not really needed. We could use sprintf

filenames <- sprintf('%03d.csv', id)

i.e.

data <- function(directory, id_default = 1:332){
     setwd(paste0("/Users/", directory))
     filenames <-  sprintf('%03d.csv', id_default)
     do.call(rbind, lapply(filenames, read.csv))
 }

Upvotes: 2

Related Questions