How do I run this command in the xlsx library?

Question

I'm trying to run a command in this tutorial on RMANOVA:

https://finnstats.com/index.php/2021/04/06/repeated-measures-of-anova-in-r/

However, when I try to run this command:

data <- read.xlsx("D:/RStudio/data.xlsx",sheetName="Sheet1")

It gives me the following error:

Error in loadWorkbook(file, password = password) : Cannot find D:/RStudio/data.xlsx

It appears that this is not loading because it requires a local data file that I don't have. However, I don't see that there is this data file in the tutorial page. Am I correct in assuming the file is missing, or is this command supposed to build the spreadsheet?

Any help would be great. Thanks!

Manuel Popp · Accepted Answer

In short

You are correct in assuming the file is missing. This command will not create any spreadsheets but it will read a spreadsheet stored at "D:/RStudio/data.xlsx".

Possible solution

You could create a dataset on your own, e.g., like this:

# Treatment A: Samples for the different time steps T0, T1, and T2
T0A <- rnorm(12, mean = 7.853, sd = 3.082)
T1A <- rnorm(12, mean = 9.298, sd = 2.090)
T2A <- rnorm(12, mean = 5.586, sd = 0.396)
# Treatment B: Samples for the different time steps T0, T1, and T2
T0B <- rnorm(12, mean = 7.853, sd = 3.082)
T1B <- rnorm(12, mean = 9.298, sd = 2.090)
T2B <- rnorm(12, mean = 5.586, sd = 0.396)

# Combine the values in a data.frame
data <- data.frame(time = c(rep("T0", 12), rep("T1", 12), rep("T2", 12),
                            rep("T0", 12), rep("T1", 12), rep("T2", 12)),
                   score = c(T0A, T1A, T2A, T0B, T1B, T2B),
                   Treatment = c(rep("A", 36), rep("B", 36))
)
# make time and Treatment factors
data$time <- as.factor(data$time)
data$Treatment <- as.factor(data$Treatment)

# Here, we are at the Summary Statistics step of the tutorial already
library(dplyr)
library(rstatix)
summary<-data %>%
  group_by(time) %>%
  get_summary_stats(score, type = "mean_sd")
data.frame(summary)

Note that treatment A and B are exactly the same in this case. Depending on what you want to test, you can alter the mean and standard deviation of the different treatments.

Additional ideas

You could also introduce outliers to your self-created data set. In the following, I just chose the mean value from T0A and 4 times the standard deviation of T0A. Then we can set a value N for how many potential outliers we want. Subsequently, we create random values that can be up to 4 standard deviations higher or lower than the mean of T0A and use those values to replace random score values within the data.frame. In this case we set a maximum of N = 1 outlier. Of course, this script could be adapted to set certain ranges of potential values dependent on the time and Treatment factors (but that's beyond the scope of this answer for now).

mean_value <- 7.853
extreme <- 4*3.082
N <- 1
outlier_values <- runif(N, min = mean_value - extreme, max = mean_value + extreme)
outliers <- round(runif(N, min = 1, max = nrow(data)), digits = 0)
data$score[outliers] <- outlier_values

In my opinion, this data set is more useful than any example data set, because you can now change mean and standard deviation values and introduce outliers, etc, so you can experiment with the data and see how your statistical tests respond in various situations.

How do I run this command in the xlsx library?

Answers (2)

Related Questions