Reputation: 11
For my class we have to create a model to predict the credit balance of each individuals. Based on observations, many results are zero where the lm
tries to calculate them.
To overcome this I created a new variable that results in zero if X and Y are true.
CB$Balzero = ifelse(CB$Rating<=230 & CB$Income<90,0,1)
This resulted in getting 90% of the zero results right. The problem is:
How can I place this variable in the lm
so it correctly results in zeros when the proposition is true and the calculation when it is false?
Something like: lm=Balzero*(Balance~.)
Upvotes: 1
Views: 172
Reputation: 226532
I think that
y ~ -1 + Balzero:Balance
might work (you haven't given us a reproducible example to try).
-1
tells R to omit the intercept:
specifies an interaction. If both variables are numeric, then A:B
includes the product of A
and B
as a term in the model.The second term could also be specified as I(Balzero*Balance)
(I
means "as is", i.e. interpret *
in the usual numerical sense, not in its formula-construction context.)
These specifications should fit the model
Y = beta1*Balzero*Balance + eps
where eps
is an error term.
If Balzero == 0
, the predicted value will be zero. If Balzero==1
the predicted value will be beta1*Balance
.
You might want to look into random forest models, which naturally incorporate the kind of qualitative splitting that you're doing by hand in your example.
Upvotes: 2