user19972299
user19972299

Reputation: 11

Linear Regression Model with a variable that zeroes the result

For my class we have to create a model to predict the credit balance of each individuals. Based on observations, many results are zero where the lm tries to calculate them.

To overcome this I created a new variable that results in zero if X and Y are true.

CB$Balzero = ifelse(CB$Rating<=230 & CB$Income<90,0,1)

This resulted in getting 90% of the zero results right. The problem is:

How can I place this variable in the lm so it correctly results in zeros when the proposition is true and the calculation when it is false?

Something like: lm=Balzero*(Balance~.)

Upvotes: 1

Views: 172

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226532

I think that

y ~ -1 + Balzero:Balance

might work (you haven't given us a reproducible example to try).

  • -1 tells R to omit the intercept
  • : specifies an interaction. If both variables are numeric, then A:B includes the product of A and B as a term in the model.

The second term could also be specified as I(Balzero*Balance) (I means "as is", i.e. interpret * in the usual numerical sense, not in its formula-construction context.)

These specifications should fit the model

Y = beta1*Balzero*Balance + eps

where eps is an error term.

If Balzero == 0, the predicted value will be zero. If Balzero==1 the predicted value will be beta1*Balance.

You might want to look into random forest models, which naturally incorporate the kind of qualitative splitting that you're doing by hand in your example.

Upvotes: 2

Related Questions