anon
anon

Reputation:

for Loop in R not binding files

I'm fairly new to R, so my apologies if this is a very basic question.

I'm trying to read two Excel files in, using the list.files(pattern) method, then using a for loop to bind the files and replace values in the bound file. However, the output that my script is producing is the output from only one file, meaning that it is not binding.

The file names are fact_import_2020 and fact_import_20182019.

FilePath <- "//srdceld2/project2/"
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
                        full.names = FALSE, recursive = FALSE,
                        ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

FileCount <- length(FileNames)

for(i in 1:FileCount){
  MOH_TotalHC_1 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
  MOH_TotalHC_2 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
  MOH_TotalHC <- rbind(MOH_TotalHC_1, MOH_TotalHC_2)
  MOH_TotalHC <- MOH_TotalHC[complete.cases(MOH_TotalHC), ]

Upvotes: 3

Views: 201

Answers (2)

SidharthMacherla
SidharthMacherla

Reputation: 400

The potential solution is below. This solution is taken from here and seems like a duplicate question.

Potential solution:

library(readxl)
library(data.table)

#Set your path here
FilePath <- "//srdceld2/project2/"

#Update the pattern to suit your needs. Currently, its just set for XLSX files
file.list <- list.files(path = FilePath, pattern = "*.xlsx", full.names = T)
df.list <- lapply(file.list, read_excel, sheet = 1, range = cell_cols("a:i"))
attr(df.list, "names") <- file.list
names(df.list) <- file.list
setattr(df.list, "names", file.list)

#final data frame is here
dfFinal <- rbindlist(df.list, use.names = TRUE, fill = TRUE)

Assumptions and call outs:

  1. The files in the folder are similar file types. For example xlsx.
  2. The files could have different set of columns and NULLs as well.
  3. Note that the order of the columns matter and so if there are more columns in new file the number of output columns could be different.

Note: Like @Sathish, I am guessing what the input could look like

Upvotes: 0

Sathish
Sathish

Reputation: 12713

use full.names = TRUE in list.files().

After this, make sure FileNames has full path of the files.

Then loop through the filenames, instead of filecount.


I think, you are trying to do this. I am guessing here. Please see below.

You are getting data from one file, because you are overwriting the data from file-2 with data from file-1. The for() loop is indicating it.

FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
                        full.names = TRUE, recursive = FALSE,
                        ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

# list of data from excell files
df_lst <- lapply(FileNames, function(fn){
  read_excel(fn, sheet = 1, range = cell_cols("A:I"))
})

# combine both data
MOH_TotalHC <- do.call('rbind', df_lst)
# complete cases
MOH_TotalHC[complete.cases(MOH_TotalHC), ]

Upvotes: 3

Related Questions