Prudhvi Charan
Prudhvi Charan

Reputation: 115

Coefficients of linear regression y=mx+c using lm() differ in magnitude from what I expect

ddd = lm('USER ID' ~ 'CREATED ON')  
summary(ddd) 

The slope of line in second image should be approx. (6000-0)/(2017-2016)=6000 but the slope as shown in first image is 2.204e-04. How does this make sense?

(USER ID and CREATED ON are same as no of users and time as shown in plot)

please see attached output image

Upvotes: 0

Views: 462

Answers (1)

Zheyuan Li
Zheyuan Li

Reputation: 73415

I generated plot using plot(Data1$'CREATED ON', Data1$'USER ID', cex = 0.5, xlab = "Time", ylab = "No.Of Users") then abline(lm('USER ID'~'CREATED ON', Data1), col=4).

At time = 2017, No.of Users ~ 6000 and At time = 2016 No.of Users ~ 0 so slope must be (6000 - 0)/(2017-2016) = 6000, but the slope shown is in 10^-4 magnitude.

CREATED ON column is a Date Time type. class(CREATED ON) gives output "POSIXct" "POSIXt"

Check as.integer(Data1$'CREATED ON'). Date and DateTime object are integers that can be large.


In general, why not just extract the model matrix to see what columns are?

model.matrix.lm(ddd)

This immediately exposes the problem. Regression coefficients are computed using this model matrix.

Upvotes: 1

Related Questions