JohnK
JohnK

Reputation: 1039

Three factor logistic regression with interactions

I have a three factor contigency table that explores the association between committed crimes, Shoplifting or other theft acts here, gender and prior convictions on the one hand and lenient setences on the other. Lenient senteces is the response variable here and is binary ,1 for receiving a lenient sentence, 0 otherwise.

            Crime Gender Priorconv Yes No
1      Shoplifting    Men         N  24  1
2 Other Theft Acts    Men         N  52  9
3      Shoplifting  Women         N  48  3
4 Other Theft Acts  Women         N  22  2
5      Shoplifting    Men         P  17  6
6 Other Theft Acts    Men         P  60 34
7      Shoplifting  Women         P  15  6
8 Other Theft Acts  Women         P   4  3

You can recreate the table using these commands

table1<-expand.grid(Crime=factor(c("Shoplifting","Other Theft Acts")),Gender=factor(c("Men","Women")),
Priorconv=factor(c("N","P")))

table1<-data.frame(table1,Yes=c(24,52,48,22,17,60,15,4),No=c(1,9,3,2,6,34,6,3))

I have been trying to run a logistic regression but quickly ran into trouble when I tried to include interactions between my variables. The glm works perfectly without the interactions. The code I have been using is

fit<-glm(cbind(Yes,No)~Crime+Gender+Priorconv+I(Crime*Priorconv),data=table1,family=binomial)

and the error I have been getting

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In Ops.factor(Crime, Priorconv) : * not meaningful for factors

Could you please tell how I could deal with this error?

Thank you

Upvotes: 2

Views: 432

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226077

By specifying I(Crime*Priorconv) you are asking R to compute the value Crime*Priorconv, which it refuses to do (because it doesn't make sense to multiply factors). If Crime and Priorconv were already numeric dummy variables (e.g. 0/1 coding with 0=shoplifting, 1=other and 0=N, 1=P) then it would make sense to multiply them, and you would use the I() notation to indicate that you wanted to multiply them.

Otherwise (if you don't use I()), R will interpret * as "interaction plus all lower-order effects", i.e. Crime*Priorconv corresponds to 1+Crime+Priorconv+Crime:Priorconv (where : denotes the interaction). R would automatically handle the redundancies (i.e. the fact that you have already specified main effects of Crime and Priorconv): in a formula context, including redundant main effects and explicitly including the intercept (1) or not are all equivalent. These formulae will all specify the same model:

1+Crime+Priorconv+Crime:Priorconv
Crime+Priorconv+Crime*Priorconv
Crime+Priorconv+Crime:Priorconv
Crime*Priorconv

but I prefer the last one: as @J.R. points out in his answer you can take advantage of the * notation to express your model more compactly.

Upvotes: 5

J.R.
J.R.

Reputation: 3878

You can use x:y in the formula to specify interactions between x and y, eg.:

fit<-glm(cbind(Yes,No)~Crime+Gender+Priorconv+Crime:Priorconv,data=table1,family=binomial)

or a little shorter:

fit<-glm(cbind(Yes,No)~Gender+Crime*Priorconv,data=table1,family=binomial)

Upvotes: 3

Related Questions