Reputation: 2867
I am using the Boston dataset as my input and I am trying to build a model to predict MEDV (median values of owner-occupied housing in USD 1000) using RM (average numbers of rooms per dwelling)
I have bastardised the following code from Digitheads blog and not by very much as you can see.
My code is as follows:
#library(datasets)
#data("Boston")
x <- Boston$rm
y <- Boston$medv
# fit a linear model
res <- lm( y ~ x )
print(res)
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-34.671 9.102
# plot the data and the model
plot(x,y, col=rgb(0.2,0.4,0.6,0.4), main='Linear regression')
abline(res, col='blue')
# squared error cost function
cost <- function(X, y, theta) {
sum( (X %*% theta - y)^2 ) / (2*length(y))
}
# learning rate and iteration limit
alpha <- 0.01
num_iters <- 1000
# keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)
# initialize coefficients
theta <- matrix(c(0,0), nrow=2)
# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))
# gradient descent
for (i in 1:num_iters) {
error <- (X %*% theta - y)
delta <- t(X) %*% error / length(y)
theta <- theta - alpha * delta
cost_history[i] <- cost(X, y, theta)
theta_history[[i]] <- theta
}
print(theta)
[,1]
[1,] -3.431269
[2,] 4.191125
As per Digitheads blog, his value for theta using the lm (linear model) and his value from gradient descent match, whereas mine doesn't. Shouldn't these figures match?
As you can see from the plot for the various values of theta, my final y intercept does not tally up with the print(theta) value a few lines up?
Can anyone make a suggestion as to where I am going wrong?
Upvotes: 1
Views: 780
Reputation: 3647
Gradient descent takes a while to converge. Increasing the number of iterations will get the model to converge to the OLS values. For instance:
# learning rate and iteration limit
alpha <- 0.01
num_iters <- 100000 # Here I increase the number of iterations in your code to 100k.
# The gd algorithm now takes a minute or so to run on my admittedly
# middle-of-the-line laptop.
# keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)
# initialize coefficients
theta <- matrix(c(0,0), nrow=2)
# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))
# gradient descent (now takes a little longer!)
for (i in 1:num_iters) {
error <- (X %*% theta - y)
delta <- (t(X) %*% error) / length(y)
theta <- theta - alpha * delta
cost_history[i] <- cost(X, y, theta)
theta_history[[i]] <- theta
}
print(theta)
[,1]
[1,] -34.670410
[2,] 9.102076
Upvotes: 1