thanks_in_advance
thanks_in_advance

Reputation: 2743

ANOVA using wide data table format

My original data are in wide format as displayed in Table A.

Let's say I want to research whether veterans who have experienced different tours of military service suffer from different levels of depression.

I decide to run a one-way ANOVA test on the data with Depression_Score as the criterion and 'tour of duty served' as the factor. I know I can reshape the data into long format as in Table B, and then run the ANOVA.

Here's my question though: is it possible to run an ANOVA test directly on Table A without reshaping the data into Table B?

If yes, then what R commands would I use to program this?

Table A:

ArmyVet_ID  Served_WW2  Served_KoreanWar    Served_VietnamWar   Depression_Score
110001          1              0                    0                3
110002          1              0                    0                1
110004          0              1                    0                4
110005          0              1                    0                3
110009          0              0                    1                7
110010          0              0                    1                5

Table B:

ArmyVet_ID    Served            Depression_Score
110001          WW2                    3
110002          WW2                    1
110004          KoreanWar              4
110005          KoreanWar              3
110009          VietnamWar             7
110010          VietnamWar             5

Upvotes: 2

Views: 857

Answers (1)

thelatemail
thelatemail

Reputation: 93908

If you drop one of the columns and feed it to as.matrix, you can get the same result as passing the combined factor:

anova(lm(Depression_Score ~ as.matrix(A[3:4]), data=A))
#Analysis of Variance Table
#
#Response: Depression_Score
#                  Df Sum Sq Mean Sq F value Pr(>F)
#as.matrix(A[3:4])  2 16.333  8.1667  5.4444 0.1004
#Residuals          3  4.500  1.5000 

Compared to the factor result as per table B in your example:

anova(lm(Depression_Score ~ I(factor(c(1,1,2,2,3,3))), data=A))
#Analysis of Variance Table
#
#Response: Depression_Score
#                               Df Sum Sq Mean Sq F value Pr(>F)
#I(factor(c(1, 1, 2, 2, 3, 3)))  2 16.333  8.1667  5.4444 0.1004
#Residuals                       3  4.500  1.5000

Upvotes: 1

Related Questions