Hutchins
Hutchins

Reputation: 85

Problem when doing Anova on categorical variable in R

I have two variables: decsorgs2 and regionfactor (which is a factored "region")

freq(decsorgs2)
 decsorgs2 
            Frequency Percent
 0 Disagree       365   53.76
 1 Agree          314   46.24
 Total            679  100.00

freq(regionfactor)
regionfactor 
        Frequency Percent
 1            12   1.767
 2            82  12.077
 3           128  18.851
 4            64   9.426
 5           138  20.324
 6            43   6.333
 7            53   7.806
 8            57   8.395
 9           102  15.022
 Total       679 100.000

I am trying to do an anova with aov().

  aov(decsorgs2~regionfactor)
  Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
   NA/NaN/Inf in 'y'
  In addition: Warning message:
 In model.response(mf, "numeric") : NAs introduced by coercion

What are these errors? I don't understand any of these terms

EDIT: Ok, I did a hail-mary random attempt and recoded decsorgs2.

Initially I had:

decsorgs2 = recode(DECSORGS, "4:5='0 Disagree'; 1:2='1 Agree'")

Now I used:

decsorgs2 = recode(DECSORGS, "4:5=0; 1:2=1")

It seemed to work. But why? Why does decsorgs2 have to be numerical, if the purpose of factoring the variable region was to make it be read as categorical? How do I know which one has to be numerical and which categorical?

Upvotes: 1

Views: 4796

Answers (1)

Roland
Roland

Reputation: 132706

aov needs a continuous response variable. You are passing a character variable and it is coerced to numeric:

y <- c("0 Disagree", "1 Agree")
as.numeric(y)
#[1] NA NA
#Warning message:
#NAs introduced by coercion 

y <- c("0", "1")
as.numeric(y)
#[1] 0 1

You need to reconsider your statistical methodology.

Upvotes: 1

Related Questions