How can I have my code run through different excel files and export unique files of results in R?

Question

I have six excel files, all with the same format and same number of variables (but varying observations). I have a piece of code that I need to run them all through and export the results (p values from stat test), but I am wondering how I can have the code run through all six files and export six unique files.

The code works great, but I am having to upload each file separately and export it under different names each time.

data <- MorePercent & MoreCount & LessPercent & LessCount


#Data
    Type   Neutrophils  Monocytes   NKC .....
    ------------------------------------------
    IN       546          2663      545
    IN       0797         7979      008
    OUT      0899         3899      345
    OUT      6868         44533     689
    HC       9898         43443     563


#Kruskal and export 
PV <- as.data.frame(apply(data[-1], 2, 
              function(x) kruskal.test(x = x, g = data$Type)$p.value))

#rename col
colnames(PV) <- "PVal"
#export
write.xlsx(PV, "C:/Users/sxa074/Desktop/P01/Science Trans Med Follow Up/test2.xlsx")


#Post Hoc and export
PV1 <- as.data.frame(apply(data[-1], 2, 
           function(x) pairwise.wilcox.test(x = x, g = data$Type, p.adjust.method = "BH")$p.value))

#Omit NA row
PV1 <- na.omit(PV1)

#Rename Rows
row.names(PV1) <- c("HC vs In", "HC vs OUT", "IN vs OUT")
#Export
write.xlsx(PV1, "C:/Users/sxa074/Desktop/P01/Science Trans Med Follow Up/test2A.xlsx")

I am currently exporting two excel files for every original file (Kruskal P values and Wilcox P values), but I have to go in manually and adjust the names every time. I want the code to run through all of them and export unique files.

Many thanks in advance.

David O · Accepted Answer

A for loop seems appropriate here. This solution assumes that your data are in one directory and that your results are to be placed in another. However, they they can be in the same directory. In this solution, the p-values from the Kruskal-Wallis test and the p-values from the pairwise Wilcoxon test are combined into a single table that will be saved as an Excel file.

In case you're interested in consolidating the results, you could use the sheetName option in write.xlsx() to create a single Excel file with multiple sheets.

In this example, you'll (of course) need to replace the placeholder "<...>" with appropriate parameters for your situation.

# Define directories of data and results
  dataPath <- 
  resultsPath <- 

# Vector of file names; a pattern for old and new Excel files is "xlsx?$"
  ff <- list.files(path = dataPath, pattern = "", full = TRUE)

# Good old fashioned for loop
  for (f in ff) {
    data <- read.xslx(f, ) # edit as required
    pv1 <- apply(data[-1], 2, function(x) kruskal.test(x, data$Type)$p.value)
    pv1 <- t(pv1) # transpose 3 x 1 result to 1 x 3
    pv2 <- apply(data[-1], 2, function(x) pairwise.wilcox.test(x, data$Type,
      p.adjust.method = "BH")$p.value)
    pv2 <- na.omit(pv2)
    ans <- rbind(pv1, pv2)
    rownames(ans) <- c("Pval", "HC vs In", "HC vs OUT", "IN vs OUT")
    fout <- file.path(resultsPath, paste("RESULTS", basename(f), sep = "_"))
    write.xlsx(ans, fout, rowNames = TRUE)
  }

EDIT

Regarding the placeholder "<...>" variables:

For the directories, this is just the location of the files. For example, if one directory is to be used for each experiment, both data and results can be stored in the same directory. On a Windows machine, this could be something like "C:\Users\Owner\Documents\Flow Analysis\2019_0512". Both dataPath and resultsPath would be set to that value. Incidentally, you could use the following code to extract the data directory holding the file you choose. The function file.choose() allows you to select a file through the usual operating system dialog.

  dataPath <- dirname(file.choose())

For the list.files() function, see the help function for list.files (or dir). The pattern option specifies a regular expression that can be used to select only the data files. For example, if the six data files looked like "FlowFile001.xlsx", "FlowFile002.xlsx", ..., "FlowFile006.xlsx" then a regular expression pattern that specifies those files could be "Flow.*xlsx$". (Look up regex to learn about the world of regular expression if that's unfamiliar.)

The parameters for read.xslx() depend on the structure of your data in the Excel file. It could, for instance, include skipping blank rows with skipEmptyRows = TRUE.

How can I have my code run through different excel files and export unique files of results in R?

Answers (1)

EDIT

Related Questions