Christian De Pierro
Christian De Pierro

Reputation: 13

Comparing two groups with multiple variables using anova or another testing method in r

Working on my master thesis right now. I have 2 groups: Showering as usual and Cold shower group. Variables are age, gender, weight, psychological wellbeing, physiological wellbeing, sleep quality, movement behvaior, skin texture, shower behavior etc.

Head(data1)

   Code Gruppe StudentBasel Alter Grösse Gewicht0W Gewicht12W
1 TURN12      2           Ja    50    159        70         72
2 AMMN17      1         Nein    26    164        52         50
3 LKPG08      2         Nein    19    167        54         NA
4 LJRn05      2         Nein    22    180        60         NA
5 AGBD08      1         Nein    24    165        49         NA
6 IUGH20      2         Nein    32    168        54         NA
  Geschlecht WHO1W WHO4W WHO8W WHO12W FEW1W FEW4W FEW8W FEW12W
1          w     6    21    24     25    87    70    80     75
2          w    24    22    25     22    77    78    83     74
3          w    16    NA    NA     NA    65    NA    NA     NA
4          w    19    NA    NA     NA    61    NA    NA     NA
5          w    23    18    22     NA    61    61    56     NA
6          w    22    NA    NA     NA    66    NA    NA     NA
  SchlafA1W SchlafA4W SchlafA8W SchlafA12W SchlafWT1W SchlafWT4W
1        32        25        25         30         49         32
2        35        31        35         28         46         43
3        28        NA        NA         NA         31         NA
4        23        NA        NA         NA         32         NA
5        27        28        26         NA         35         34
6        27        NA        NA         NA         41         NA

So. I have two groups and data from the 4th, 8th and 12th week. I want to compare the groups by the means on the 4th week. Running t-tests for every variable was not suggested because of some error i'm not considering. So I thought, I'll use an ANOVA like this

CSSAUW4 <- aov(formula = Gruppe ~ WHO4W + FEW4W + Dauer4W + SchlafA4W + SchlafWT4W + Einschlafzeit4W + Schwitzen + Haut4W + KHaut4W + Abwesenheit4W + Krankheitssymptome4W + Duschhäufigkeit4W,
              data = Group4W)

So I got all my results and was pretty happy, but I wasn't able to conduct a TukeyHSD() test, cause "Group" was not a factor. So changed it to factor a factor with as.factor(), but now I can't calculate my ANOVA anymore. Apparently I did it all wrong and should have used a aov(numeric variable ~ group) to compare everything, but then I got the same problem like on the variant with the t-test to write every code for every single variable.

So I read something about lme4 ANOVA's but I find it really difficult to understand how to code it for my data since I successfully dodged every R course in my university. I'd like to have some simple coding like: Test(Group ~ variable1, variable2, variable3, data=data1) and that's it. For Week4, Week8, Week12.

I was thinking of using lm(group ~ variable1, variable2, etc.) instead. Would that be possible and make sense for my data? I'm doubting my statistical intelligence is right on that one :D

Second question: I have the problem of having a little dataset (loss to follow up for the 12th week of 90%). So at the moment I got only 8 participants in each group. Can I do the same mean comparison on the 12th week like on the 4th week (with 25 participants each)?

Help would be really appreciated!!

Greetings Christian

Upvotes: 1

Views: 1357

Answers (1)

StupidWolf
StupidWolf

Reputation: 46978

Example data:

set.seed(100)
data1 = data.frame(
Code =sample(letters,100,replace=TRUE),
Gruppe=sample(1:2,100,replace=TRUE),
matrix(rpois(100*11,100),nrow=100)) 
colnames(data1)[-c(1:2)] = c("StudentBasel","Alter","Grösse",
"WHO1W","WHO4W","WHO8W","WHO12W","FEW1W","FEW4W","FEW8W","FEW12W") 

You can select the columns you want to test:

test_columns = c("WHO4W","WHO8W","WHO12W")

So, if you just want to test say 4,8 and 12 together, for WHO4 series, you do, the select command essentially selects the columns you want to test:

library(tidyr)
library(dplyr)
library(broom)

data1 %>% 
select(c("Gruppe",test_columns)) %>% 
pivot_longer(-Gruppe)

# A tibble: 300 x 3
   Gruppe name   value
    <int> <chr>  <int>
 1      2 WHO4W     97
 2      2 WHO8W     91
 3      2 WHO12W    93
 4      1 WHO4W     99
 5      1 WHO8W    103
 6      1 WHO12W    92
 7      2 WHO4W     91
 8      2 WHO8W    111
 9      2 WHO12W   120
10      1 WHO4W    119
# … with 290 more rows

In the above step, I basically repeated for joined every week with its corresponding Gruppe, this is called pivoting a table into long format.

So what you want to do, is a test for Gruppe, within every variable, and you can do it by grouping it first (group_by) followed by the aov as you do by contained within a "do", which means do aov on every group:

result = data1 %>% 
select(c("Gruppe",test_columns)) %>% 
pivot_longer(-Gruppe) %>% 
group_by(name) %>% 
do(tidy(aov(value ~ Gruppe,data=.))) 

# A tibble: 6 x 7
# Groups:   name [3]
  name   term         df    sumsq meansq statistic p.value
  <chr>  <chr>     <dbl>    <dbl>  <dbl>     <dbl>   <dbl>
1 WHO12W Gruppe        1   131.   131.      1.25     0.266
2 WHO12W Residuals    98 10247.   105.     NA       NA    
3 WHO4W  Gruppe        1   111.   111.      1.01     0.316
4 WHO4W  Residuals    98 10740.   110.     NA       NA    
5 WHO8W  Gruppe        1     1.63   1.63    0.0169   0.897
6 WHO8W  Residuals    98  9428.    96.2    NA       NA    

Now we simply take out on terms that contain Gruppe, we are not interested in the residuals:

result %>% filter(term=="Gruppe")
# A tibble: 3 x 7
# Groups:   name [3]
  name   term      df  sumsq meansq statistic p.value
  <chr>  <chr>  <dbl>  <dbl>  <dbl>     <dbl>   <dbl>
1 WHO12W Gruppe     1 131.   131.      1.25     0.266
2 WHO4W  Gruppe     1 111.   111.      1.01     0.316
3 WHO8W  Gruppe     1   1.63   1.63    0.0169   0.897

I suggest this above because it is easier to explain to people what you have done (you cannot say I did an anova..), and easier to interpret. You can use a big aov and do a posthoc, but please read up and understand what anova is doing before applying this:

#pivot long like before
aov_df = data1 %>% 
select(c("Gruppe",test_columns)) %>% 
pivot_longer(-Gruppe)
# now we have a sub group for every measurement, eg. group 1 + wk4, group #2 + wk4 and so on
aov_df$subgroup = paste0(aov_df$name,aov_df$Gruppe)

result = TukeyHSD(aov(value ~ subgroup,data=aov_df))
# the below are the meaningful comparisons you need:
result$subgroup[c("WHO12W2-WHO12W1","WHO4W2-WHO4W1","WHO8W2-WHO8W1"),]
                      diff       lwr      upr     p adj
WHO12W2-WHO12W1  2.2938808 -3.560239 8.148000 0.8711455
WHO4W2-WHO4W1    2.1151369 -3.738983 7.969256 0.9052955
WHO8W2-WHO8W1   -0.2560386 -6.110158 5.598081 0.9999956

Upvotes: 3

Related Questions