Reputation: 25
Is there a way to run a linear regression with R with interaction terms between continuous and categorical variable but excluding the continuous variable itself?
I am studying relation between housing rents and dwell floorspace. There are four different regions in my dataset, and I assume that the relation is different across them. I am using linear regression of rent
on region
and interaction between floorspace
and region
, and I want to have coefficients on region
and on interaction terms, but using lm
with interaction term forces floorspace
to appear as independent variable, too.
That's how it goes:
lm(formula = rent ~ factor(region) + factor(region) * floorspace,
data = mydataset)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.67252 0.06792 68.792 < 2e-16 ***
factor(region)2 -0.39859 0.09453 -4.216 2.52e-05 ***
factor(region)3 -0.23631 0.17870 -1.322 0.186078
factor(region)4 -0.49076 0.10329 -4.751 2.07e-06 ***
floorspace -0.38658 0.01539 -25.119 < 2e-16 ***
factor(region)2:floorspace 0.20481 0.02145 9.550 < 2e-16 ***
factor(region)3:floorspace -0.00884 0.03987 -0.222 0.824552
factor(region)4:floorspace 0.08022 0.02348 3.416 0.000638 ***
What I want instead is this:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.67252 0.06792 68.792 < 2e-16 ***
factor(region)2 -0.39859 0.09453 -4.216 2.52e-05 ***
factor(region)3 -0.23631 0.17870 -1.322 0.186078
factor(region)4 -0.49076 0.10329 -4.751 2.07e-06 ***
factor(region)1:floorspace -0.38658 0.01539 -25.119 < 2e-16 ***
factor(region)2:floorspace -0.18177 ??????? ????? ???????
factor(region)3:floorspace -0.39543 ??????? ????? ???????
factor(region)4:floorspace -0.30636 ??????? ????? ???????
Reason is that from interpretation point of view it makes more sense to show effect of floorspace
for each region separately, instead of showing it for region=1
with floorspace
, and the rest as difference between the effect for the given region and the region=1
Upvotes: 0
Views: 3259
Reputation: 94162
First I'll make a test data set with: mydataset = data.frame(rent=runif(100), region=sample(1:4, 100,TRUE), floorspace=runif(100))
Take the linear term in floorspace
out of the formula by subtraction:
summary(lm(formula = rent ~ factor(region) + factor(region) * floorspace - floorspace, data=mydataset))
Call:
lm(formula = rent ~ factor(region) + factor(region) * floorspace -
floorspace, data = mydataset)
Residuals:
Min 1Q Median 3Q Max
-0.52917 -0.26151 0.01225 0.24816 0.52392
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.50329 0.09238 5.448 4.23e-07 ***
factor(region)2 0.01331 0.13804 0.096 0.923
factor(region)3 0.05716 0.16860 0.339 0.735
factor(region)4 -0.03252 0.16234 -0.200 0.842
factor(region)1:floorspace 0.16273 0.22805 0.714 0.477
factor(region)2:floorspace 0.01638 0.19894 0.082 0.935
factor(region)3:floorspace -0.14251 0.20262 -0.703 0.484
factor(region)4:floorspace -0.05094 0.24191 -0.211 0.834
Upvotes: 2