generic_user
generic_user

Reputation: 3562

"Vectorizing" this for-loop in R? (suppressing interaction main effects in lm)

When interactions are specified in lm, R includes main effects by default, with no option to suppress them. This is usually appropriate and convenient, but there are certain instances (within estimators, ratio LHS variables, among others) where this isn't appropriate.

I've got this code that fits a log-transformed variable to a response variable, independently within subsets of the data.

Here is a silly yet reproducible example:

id = as.factor(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,6,7,7,8,8,8,9,9,9,9,10))
x = rexp(length(id))
y = rnorm(length(id))
logx = log(x)
data = data.frame(id,y,logx)

for (i in data$id){
    sub = subset(data, id==i)   #This splits the data by id
    m = lm(y~logx-1,data=sub)   #This gives me the linear (log) fit for one of my id's
    sub$x.tilde = log(1+3)*m$coef   #This linearizes it and gives me the expected value for x=3
    data$x.tilde[data$id==i] = sub$x.tilde #This puts it back into the main dataset
    data$tildecoeff[data$id==i] = m$coef #This saves the coefficient (I use it elsewhere for plotting)
    }

I want to fit a model like the following:

Y = B(X*id) +e

with no intercept and no main effect of id. As you can see from the loop, I'm interested in the expectation of Y when X=3, constrained the fit through the origin (because Y is a (logged) ratio of Y[X=something]/Y[X=0].

But if I specify

m = lm(Y~X*as.factor(id)-1)

there is no means of suppressing the main effects of id. I need to run this loop several hundred times in an iterative algorithm, and as a loop it is far too slow.

The other upside of de-looping this code is that it'll be much more convenient to get prediction intervals.

(Please, I don't need pious comments about how leaving out main effects and intercepts is improper -- it usually is, but I can promise that it isn't in this instance).

Thanks in advance for any ideas!

Upvotes: 0

Views: 483

Answers (1)

Jan van der Laan
Jan van der Laan

Reputation: 8105

I think you want

m <- lm(y ~ 0 + logx : as.factor(id))

see R-intro '11.1 Defining statistical models; formulae'

Upvotes: 1

Related Questions