Reputation: 85
I have six excel files, all with the same format and same number of variables (but varying observations). I have a piece of code that I need to run them all through and export the results (p values from stat test), but I am wondering how I can have the code run through all six files and export six unique files.
The code works great, but I am having to upload each file separately and export it under different names each time.
data <- MorePercent & MoreCount & LessPercent & LessCount
#Data
Type Neutrophils Monocytes NKC .....
------------------------------------------
IN 546 2663 545
IN 0797 7979 008
OUT 0899 3899 345
OUT 6868 44533 689
HC 9898 43443 563
#Kruskal and export
PV <- as.data.frame(apply(data[-1], 2,
function(x) kruskal.test(x = x, g = data$Type)$p.value))
#rename col
colnames(PV) <- "PVal"
#export
write.xlsx(PV, "C:/Users/sxa074/Desktop/P01/Science Trans Med Follow Up/test2.xlsx")
#Post Hoc and export
PV1 <- as.data.frame(apply(data[-1], 2,
function(x) pairwise.wilcox.test(x = x, g = data$Type, p.adjust.method = "BH")$p.value))
#Omit NA row
PV1 <- na.omit(PV1)
#Rename Rows
row.names(PV1) <- c("HC vs In", "HC vs OUT", "IN vs OUT")
#Export
write.xlsx(PV1, "C:/Users/sxa074/Desktop/P01/Science Trans Med Follow Up/test2A.xlsx")
I am currently exporting two excel files for every original file (Kruskal P values and Wilcox P values), but I have to go in manually and adjust the names every time. I want the code to run through all of them and export unique files.
Many thanks in advance.
Upvotes: 0
Views: 67
Reputation: 803
A for
loop seems appropriate here. This solution assumes that your data are in one directory and that your results are to be placed in another. However, they they can be in the same directory. In this solution, the p-values from the Kruskal-Wallis test and the p-values from the pairwise Wilcoxon test are combined into a single table that will be saved as an Excel file.
In case you're interested in consolidating the results, you could use the sheetName
option in write.xlsx()
to create a single Excel file with multiple sheets.
In this example, you'll (of course) need to replace the placeholder "<...>
" with appropriate parameters for your situation.
# Define directories of data and results
dataPath <- <directory with flow data>
resultsPath <- <directory for results>
# Vector of file names; a pattern for old and new Excel files is "xlsx?$"
ff <- list.files(path = dataPath, pattern = "<pattern>", full = TRUE)
# Good old fashioned for loop
for (f in ff) {
data <- read.xslx(f, <other parameters>) # edit as required
pv1 <- apply(data[-1], 2, function(x) kruskal.test(x, data$Type)$p.value)
pv1 <- t(pv1) # transpose 3 x 1 result to 1 x 3
pv2 <- apply(data[-1], 2, function(x) pairwise.wilcox.test(x, data$Type,
p.adjust.method = "BH")$p.value)
pv2 <- na.omit(pv2)
ans <- rbind(pv1, pv2)
rownames(ans) <- c("Pval", "HC vs In", "HC vs OUT", "IN vs OUT")
fout <- file.path(resultsPath, paste("RESULTS", basename(f), sep = "_"))
write.xlsx(ans, fout, rowNames = TRUE)
}
Regarding the placeholder "<...>
" variables:
For the directories, this is just the location of the files. For example, if one directory is to be used for each experiment, both data and results can be stored in the same directory. On a Windows machine, this could be something like "C:\Users\Owner\Documents\Flow Analysis\2019_0512"
. Both dataPath
and resultsPath
would be set to that value. Incidentally, you could use the following code to extract the data directory holding the file you choose. The function file.choose()
allows you to select a file through the usual operating system dialog.
dataPath <- dirname(file.choose())
For the list.files()
function, see the help function for list.files
(or dir
). The pattern
option specifies a regular expression that can be used to select only the data files. For example, if the six data files looked like "FlowFile001.xlsx", "FlowFile002.xlsx", ..., "FlowFile006.xlsx" then a regular expression pattern that specifies those files could be "Flow.*xlsx$". (Look up regex
to learn about the world of regular expression if that's unfamiliar.)
The parameters for read.xslx()
depend on the structure of your data in the Excel file. It could, for instance, include skipping blank rows with skipEmptyRows = TRUE
.
Upvotes: 1