Reputation: 13
Working on my master thesis right now. I have 2 groups: Showering as usual and Cold shower group. Variables are age, gender, weight, psychological wellbeing, physiological wellbeing, sleep quality, movement behvaior, skin texture, shower behavior etc.
Head(data1)
Code Gruppe StudentBasel Alter Grösse Gewicht0W Gewicht12W
1 TURN12 2 Ja 50 159 70 72
2 AMMN17 1 Nein 26 164 52 50
3 LKPG08 2 Nein 19 167 54 NA
4 LJRn05 2 Nein 22 180 60 NA
5 AGBD08 1 Nein 24 165 49 NA
6 IUGH20 2 Nein 32 168 54 NA
Geschlecht WHO1W WHO4W WHO8W WHO12W FEW1W FEW4W FEW8W FEW12W
1 w 6 21 24 25 87 70 80 75
2 w 24 22 25 22 77 78 83 74
3 w 16 NA NA NA 65 NA NA NA
4 w 19 NA NA NA 61 NA NA NA
5 w 23 18 22 NA 61 61 56 NA
6 w 22 NA NA NA 66 NA NA NA
SchlafA1W SchlafA4W SchlafA8W SchlafA12W SchlafWT1W SchlafWT4W
1 32 25 25 30 49 32
2 35 31 35 28 46 43
3 28 NA NA NA 31 NA
4 23 NA NA NA 32 NA
5 27 28 26 NA 35 34
6 27 NA NA NA 41 NA
So. I have two groups and data from the 4th, 8th and 12th week. I want to compare the groups by the means on the 4th week. Running t-tests for every variable was not suggested because of some error i'm not considering. So I thought, I'll use an ANOVA like this
CSSAUW4 <- aov(formula = Gruppe ~ WHO4W + FEW4W + Dauer4W + SchlafA4W + SchlafWT4W + Einschlafzeit4W + Schwitzen + Haut4W + KHaut4W + Abwesenheit4W + Krankheitssymptome4W + Duschhäufigkeit4W,
data = Group4W)
So I got all my results and was pretty happy, but I wasn't able to conduct a TukeyHSD()
test, cause "Group" was not a factor. So changed it to factor a factor with as.factor()
, but now I can't calculate my ANOVA anymore. Apparently I did it all wrong and should have used a aov(numeric variable ~ group)
to compare everything, but then I got the same problem like on the variant with the t-test to write every code for every single variable.
So I read something about lme4 ANOVA's but I find it really difficult to understand how to code it for my data since I successfully dodged every R course in my university. I'd like to have some simple coding like: Test(Group ~ variable1, variable2, variable3, data=data1) and that's it. For Week4, Week8, Week12.
I was thinking of using lm(group ~ variable1, variable2, etc.) instead. Would that be possible and make sense for my data? I'm doubting my statistical intelligence is right on that one :D
Second question: I have the problem of having a little dataset (loss to follow up for the 12th week of 90%). So at the moment I got only 8 participants in each group. Can I do the same mean comparison on the 12th week like on the 4th week (with 25 participants each)?
Help would be really appreciated!!
Greetings Christian
Upvotes: 1
Views: 1357
Reputation: 46978
Example data:
set.seed(100)
data1 = data.frame(
Code =sample(letters,100,replace=TRUE),
Gruppe=sample(1:2,100,replace=TRUE),
matrix(rpois(100*11,100),nrow=100))
colnames(data1)[-c(1:2)] = c("StudentBasel","Alter","Grösse",
"WHO1W","WHO4W","WHO8W","WHO12W","FEW1W","FEW4W","FEW8W","FEW12W")
You can select the columns you want to test:
test_columns = c("WHO4W","WHO8W","WHO12W")
So, if you just want to test say 4,8 and 12 together, for WHO4 series, you do, the select command essentially selects the columns you want to test:
library(tidyr)
library(dplyr)
library(broom)
data1 %>%
select(c("Gruppe",test_columns)) %>%
pivot_longer(-Gruppe)
# A tibble: 300 x 3
Gruppe name value
<int> <chr> <int>
1 2 WHO4W 97
2 2 WHO8W 91
3 2 WHO12W 93
4 1 WHO4W 99
5 1 WHO8W 103
6 1 WHO12W 92
7 2 WHO4W 91
8 2 WHO8W 111
9 2 WHO12W 120
10 1 WHO4W 119
# … with 290 more rows
In the above step, I basically repeated for joined every week with its corresponding Gruppe, this is called pivoting a table into long format.
So what you want to do, is a test for Gruppe, within every variable, and you can do it by grouping it first (group_by) followed by the aov as you do by contained within a "do", which means do aov on every group:
result = data1 %>%
select(c("Gruppe",test_columns)) %>%
pivot_longer(-Gruppe) %>%
group_by(name) %>%
do(tidy(aov(value ~ Gruppe,data=.)))
# A tibble: 6 x 7
# Groups: name [3]
name term df sumsq meansq statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 WHO12W Gruppe 1 131. 131. 1.25 0.266
2 WHO12W Residuals 98 10247. 105. NA NA
3 WHO4W Gruppe 1 111. 111. 1.01 0.316
4 WHO4W Residuals 98 10740. 110. NA NA
5 WHO8W Gruppe 1 1.63 1.63 0.0169 0.897
6 WHO8W Residuals 98 9428. 96.2 NA NA
Now we simply take out on terms that contain Gruppe, we are not interested in the residuals:
result %>% filter(term=="Gruppe")
# A tibble: 3 x 7
# Groups: name [3]
name term df sumsq meansq statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 WHO12W Gruppe 1 131. 131. 1.25 0.266
2 WHO4W Gruppe 1 111. 111. 1.01 0.316
3 WHO8W Gruppe 1 1.63 1.63 0.0169 0.897
I suggest this above because it is easier to explain to people what you have done (you cannot say I did an anova..), and easier to interpret. You can use a big aov and do a posthoc, but please read up and understand what anova is doing before applying this:
#pivot long like before
aov_df = data1 %>%
select(c("Gruppe",test_columns)) %>%
pivot_longer(-Gruppe)
# now we have a sub group for every measurement, eg. group 1 + wk4, group #2 + wk4 and so on
aov_df$subgroup = paste0(aov_df$name,aov_df$Gruppe)
result = TukeyHSD(aov(value ~ subgroup,data=aov_df))
# the below are the meaningful comparisons you need:
result$subgroup[c("WHO12W2-WHO12W1","WHO4W2-WHO4W1","WHO8W2-WHO8W1"),]
diff lwr upr p adj
WHO12W2-WHO12W1 2.2938808 -3.560239 8.148000 0.8711455
WHO4W2-WHO4W1 2.1151369 -3.738983 7.969256 0.9052955
WHO8W2-WHO8W1 -0.2560386 -6.110158 5.598081 0.9999956
Upvotes: 3