NuValue
NuValue

Reputation: 463

NaNs produced when plotting a linear model (lm) with R

I am trying to create a normal regression model and a logistic one to predict fraud in real state data. I work with a mixed data set (categorical and numerical variables) where I have done the pre-processing and recoding so that I had balanced weight of each level per categorical variable (avoiding variables containing levels with only 1 registry mixed with levels that have many observations, and so on). I added an interaction to increase the R^2 of my lm. When I want to plot my linear model I get this warning:

    Warning messages:
1: In sqrt(crit * p * (1 - hh)/hh) : NaNs produced
2: In sqrt(crit * p * (1 - hh)/hh) : NaNs produced

It appears to be related to Cook's distance -https://bugs.r-project.org/bugzilla3/show_bug.cgi?format=multiple&id=9316- (influent factors, even though I removed outliers...). Any idea what is causing this error and what can be done to plot the linear model?

Example of my code:

lm.a3 <- lm(log(response) ~(.-file_status)*file_status, data=data) 
final.lm3 <- stepAIC(lm.a3,direction="both")
summary(final.lm3) #R^2 = 64%
par(mfrow=c(2,2))
plot(final.lm3)

Thanks for your time and I appreciate your answers

Upvotes: 2

Views: 8162

Answers (1)

NuValue
NuValue

Reputation: 463

The problem was that I did logarithm transformations before the stepAIC function was run to improve the fit. As some of my response variables where equal to 1, when doing log(response_variable) the output of this function was equal to zero for some cases. Adding a minimum quantity to the argument of the logarithm function resolved the issue: log(response_variable + 0.0001234). Thanks to @LyzandeR for his feedback.

Upvotes: 3

Related Questions