king_sules
king_sules

Reputation: 39

Multiplying a categorical variable with a dummy in regression

I am trying to run a regression that has scores regressed with a female dummy ( taking a value of 0 or 1) and I also have country for that female. I am trying to create a fixed effect on the regression where I have female interacted with country, but every method I try does not work since I am multiplying numeric with a factor

I have tried using fastdummies, but that did not work. I also tried using country-1 method, and trying to multiply with female with no success.

#first wrong
olss1= lm(pv1math ~ female + I(ggi*female) + factor(country) +  factor(year) + I(female * factor(country)), data = f1)
# second wrong
olss1= lm(pv1math ~ female + I(ggi*female) + factor(country) +  factor(year) + factor( female * country ), data = f1)

Error messages are that I cannot multiply factor with numeric

Upvotes: 1

Views: 2173

Answers (2)

jay.sf
jay.sf

Reputation: 72593

You won't need the I() here. * alone will perform an interaction, whereas I() will execute an arithmetic operation before the regression.

Compare:

lm(pv1math ~ ggi*female, data=dat)$coefficients
# (Intercept)         ggi      female  ggi:female 
#         ...         ...         ...         ... 

lm(pv1math ~ I(ggi*female), data=dat)$coefficients
# (Intercept) I(ggi * female) 
#         ...             ... 

I() is useful e.g. for polynomials, where age is a popular candidate: pv1math ~ age + I(age^2) + I(age^3), or to binarize a dependent variable in a GLM: glm(I(pv1math > 0.75) ~ ggi*female, family=binomial).

And - as @G.Grothendieck already wrote - you don't need to repeat the variables that are already present in the interaction term (it's just redundant), so you may want to try:

lm(pv1math ~ ggi*female + factor(year) + female*factor(country), data=f1)

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269421

The * operator in the formula will give interactions as well as lower order terms. Here is an example:

country <- c("A", "A", "A", "B", "B", "B")
female <- c(1, 1, 0, 1, 0, 1)
y <- 1:6

fm <- lm(y ~ country * female)
fm

giving:

Call:
lm(formula = y ~ country * female)

Coefficients:
    (Intercept)         countryB           female  countryB:female  
            3.0              2.0             -1.5              1.5  

Also we can check the model matrix

model.matrix(fm)

giving

  (Intercept) countryB female countryB:female
1           1        0      1               0
2           1        0      1               0
3           1        0      0               0
4           1        1      1               1
5           1        1      0               0
6           1        1      1               1
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$country
[1] "contr.treatment"

Upvotes: 1

Related Questions