Reputation: 81

R: Loop for importing multiple xls as df, rename column of one df and then merge all df's

The below is driving me a little crazy and I’m sure theres an easy solution.

I currently use R to perform some calculations from a bunch of excel files, where the files are monthly observations of financial data. The files all have the exact same column headers. Each file gets imported, gets some calcs done on it and the output is saved to a list. The next file is imported and the process is repeated. I use the following code for this:

filelist <- list.files(pattern = "\\.xls") 
universe_list <- list()  

count <- 1
for (file in filelist) {
  df <- read.xlsx(file, 1, startRow=2, header=TRUE)
  *perform calcs*
  universe_list[[count]] <- df   
  count <- count + 1
}

I now have a problem where some of the new operations I want to perform would involve data from two or more excel files. So for example, I would need to import the Jan-16 and the Jan-15 excel files, perform whatever needs to be done, and then move on to the next set of files (Feb-16 and Feb-15). The files will always be of fixed length apart (like one year etc)

I cant seem to figure out the code on how to do this… from a process perspective, Im thinking 1) need to design a loop to import both sets of files at the same time, 2) create two dataframes from the imported data, 3) rename the columns of one of the dataframes (so the columns can be distinguished), 4) merge both dataframes together, and 4) perform the calcs. I cant work out the code for steps 1-4 for this!

Many thanks for helping out

Upvotes: 1

Answers (2)

Parfait

Reputation: 107587

Consider mapply() to handle both data frame pairs together. Your current loop is actually reminiscient of other languages running for loop operations. However, R has many vectorized approaches to iterate over lists. Below assumes both 15 and 16 year list of files are same length with corresponding months in both and year abbrev comes right before file extension (i.e, -15.xls, -16.xls):

files15list <- list.files(path, pattern = "[15]\\.xls") 
files16list <- list.files(path, pattern = "[16]\\.xls") 

dfprocess <- function(x, y){
                df1 <- read.xlsx(x, 1, startRow=2, header=TRUE) 
                names(df1) <- paste0(names(df1), "1")            # SUFFIX COLS WITH 1

                df2 <- read.xlsx(y, 1, startRow=2, header=TRUE) 
                names(df2) <- paste0(names(df2), "2")            # SUFFIX COLS WITH 2

                df <- cbind(df1, df2)                            # CBIND DFs
                # ... perform calcs ...
                return(df)  
             }

wide_list <- mapply(dfprocess, files15list, files16list)        

long_list <- lapply(1:ncol(wide_list),                          
                    function(i) wide_list[,i])                   # ALTERNATE OUTPUT

Upvotes: 1

user4349490

Reputation: 153

First sort your filelist such that the two files on which you want to do your calculations are consecutive to each other. After that try this:

count <- 1
for (count in seq(1, (len(filelist)),2) {
             df <- read.xlsx(filelist[count], 1, startRow=2, header=TRUE)
             df1 <- read.xlsx(filelist[count+1], 1, startRow=2, header=TRUE)
             *change column names and apply merge or append depending on requirement 
             *perform calcs*
             *save*
     }

Upvotes: 0

R: Loop for importing multiple xls as df, rename column of one df and then merge all df&#39;s

Answers (2)

Related Questions

R: Loop for importing multiple xls as df, rename column of one df and then merge all df's