Reputation: 3228
I'm trying to run a command in this tutorial on RMANOVA:
https://finnstats.com/index.php/2021/04/06/repeated-measures-of-anova-in-r/
However, when I try to run this command:
data <- read.xlsx("D:/RStudio/data.xlsx",sheetName="Sheet1")
It gives me the following error:
Error in loadWorkbook(file, password = password) : Cannot find D:/RStudio/data.xlsx
It appears that this is not loading because it requires a local data file that I don't have. However, I don't see that there is this data file in the tutorial page. Am I correct in assuming the file is missing, or is this command supposed to build the spreadsheet?
Any help would be great. Thanks!
Upvotes: 0
Views: 112
Reputation: 21
Use file choose to read the files, for example
data <- read.xlsx(file.choose(),sheetName="Sheet1") #choose the Excel file you want to open with also the sheet number. Or use sheet number
Upvotes: 0
Reputation: 1175
In short
You are correct in assuming the file is missing. This command will not create any spreadsheets but it will read a spreadsheet stored at "D:/RStudio/data.xlsx"
.
Possible solution
You could create a dataset on your own, e.g., like this:
# Treatment A: Samples for the different time steps T0, T1, and T2
T0A <- rnorm(12, mean = 7.853, sd = 3.082)
T1A <- rnorm(12, mean = 9.298, sd = 2.090)
T2A <- rnorm(12, mean = 5.586, sd = 0.396)
# Treatment B: Samples for the different time steps T0, T1, and T2
T0B <- rnorm(12, mean = 7.853, sd = 3.082)
T1B <- rnorm(12, mean = 9.298, sd = 2.090)
T2B <- rnorm(12, mean = 5.586, sd = 0.396)
# Combine the values in a data.frame
data <- data.frame(time = c(rep("T0", 12), rep("T1", 12), rep("T2", 12),
rep("T0", 12), rep("T1", 12), rep("T2", 12)),
score = c(T0A, T1A, T2A, T0B, T1B, T2B),
Treatment = c(rep("A", 36), rep("B", 36))
)
# make time and Treatment factors
data$time <- as.factor(data$time)
data$Treatment <- as.factor(data$Treatment)
# Here, we are at the Summary Statistics step of the tutorial already
library(dplyr)
library(rstatix)
summary<-data %>%
group_by(time) %>%
get_summary_stats(score, type = "mean_sd")
data.frame(summary)
Note that treatment A and B are exactly the same in this case. Depending on what you want to test, you can alter the mean and standard deviation of the different treatments.
Additional ideas
You could also introduce outliers to your self-created data set. In the following, I just chose the mean value from T0A and 4 times the standard deviation of T0A. Then we can set a value N for how many potential outliers we want. Subsequently, we create random values that can be up to 4 standard deviations higher or lower than the mean of T0A and use those values to replace random score
values within the data.frame. In this case we set a maximum of N = 1 outlier. Of course, this script could be adapted to set certain ranges of potential values dependent on the time
and Treatment
factors (but that's beyond the scope of this answer for now).
mean_value <- 7.853
extreme <- 4*3.082
N <- 1
outlier_values <- runif(N, min = mean_value - extreme, max = mean_value + extreme)
outliers <- round(runif(N, min = 1, max = nrow(data)), digits = 0)
data$score[outliers] <- outlier_values
In my opinion, this data set is more useful than any example data set, because you can now change mean and standard deviation values and introduce outliers, etc, so you can experiment with the data and see how your statistical tests respond in various situations.
Upvotes: 1