Reputation: 1
I have a large data for an independent sample t-test with two factors, one of them is Gender. I want it to check the normality of each group in variable to decide the next step. So I took the following script that I found in this forum with some modifications.
for (i in 9:ncol(AF)) {
print(names(AF)[i])
print(AF %>%
group_by(Gender) %>%
summarise(`W Statistic` = ifelse(sd(AF[, i])!=0,
shapiro.test(AF[, i])$statistic,NA),
`p-value` = ifelse(sd(AF[, i])!=0,
shapiro.test(AF[, i])$p.value,NA)))
}
The result for the first variables (R_44) was a follows:
## [1] "R_44"
## # A tibble: 2 × 3
## Gender Statistic `p-value`
## <fct> <dbl> <dbl>
## 1 F 0.560 9.31e-10
## 2 M 0.560 9.31e-10
This variable at the beginning of my work I remembered doing its normality check using JASP and it was different.
In JASP the result was different:
## 1 F 0.465 1.559e -7
## 2 M 0.623 5.149e -6
I repeated the test in R for the same variable without the loop function as below:
shapiro.test(AF$R_44[AF$Gender == "F"])
shapiro.test(AF$R_44[AF$Gender == "M"])
The results were:
data: AF$R_44[AF$Gender == "F"]
W = 0.46505, p-value = 1.559e-07
data: AF$R_44[AF$Gender == "M"]
W = 0.62303, p-value = 5.149e-06
similar to JASP. Therefore, I assume I have a mistake in the first script above but I am not sure where is it. Need help here!
Upvotes: 0
Views: 193
Reputation: 388817
AF[, i]
is subsetting data from the entire dataframe and does not take into consideration the grouping by Gender
. You may use cur_data()
to subset data from the current group.
Also since sd
returns a single value it is better to use if
/else
instead of vectorised ifelse
.
Something like this should work, I can't test this since I don't have the data.
library(dplyr)
for (i in 9:ncol(AF)) {
print(names(AF)[i])
print(AF %>%
group_by(Gender) %>%
summarise(`W Statistic` = if(sd(select(cur_data(), i)) !=0)
shapiro.test(cur_data()[[i]])$statistic else NA
`p-value` = if(sd(select(cur_data(), i)) !=0)
shapiro.test(cur_data()[[i]])$p.value else NA)
}
Upvotes: 1