Difference in Theta values between gradient descent and linear model in R

Question

I am using the Boston dataset as my input and I am trying to build a model to predict MEDV (median values of owner-occupied housing in USD 1000) using RM (average numbers of rooms per dwelling)

I have bastardised the following code from Digitheads blog and not by very much as you can see.

My code is as follows:

#library(datasets)
#data("Boston")

x <- Boston$rm
y <- Boston$medv

# fit a linear model
res <- lm( y ~ x )
print(res)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
    -34.671        9.102

# plot the data and the model
plot(x,y, col=rgb(0.2,0.4,0.6,0.4), main='Linear regression')
abline(res, col='blue')

# squared error cost function
cost <- function(X, y, theta) {
  sum( (X %*% theta - y)^2 ) / (2*length(y))
}

# learning rate and iteration limit
alpha <- 0.01
num_iters <- 1000

# keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)

# initialize coefficients
theta <- matrix(c(0,0), nrow=2)

# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))

# gradient descent
for (i in 1:num_iters) {
  error <- (X %*% theta - y)
  delta <- t(X) %*% error / length(y)
  theta <- theta - alpha * delta
  cost_history[i] <- cost(X, y, theta)
  theta_history[[i]] <- theta
}

print(theta)

          [,1]
[1,] -3.431269
[2,]  4.191125

As per Digitheads blog, his value for theta using the lm (linear model) and his value from gradient descent match, whereas mine doesn't. Shouldn't these figures match?

As you can see from the plot for the various values of theta, my final y intercept does not tally up with the print(theta) value a few lines up?

Can anyone make a suggestion as to where I am going wrong?

gfgm · Accepted Answer

Gradient descent takes a while to converge. Increasing the number of iterations will get the model to converge to the OLS values. For instance:

# learning rate and iteration limit
alpha <- 0.01
num_iters <- 100000 # Here I increase the number of iterations in your code to 100k. 
# The gd algorithm now takes a minute or so to run on my admittedly 
# middle-of-the-line laptop.

# keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)

# initialize coefficients
theta <- matrix(c(0,0), nrow=2)

# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))

# gradient descent (now takes a little longer!)
for (i in 1:num_iters) {
  error <- (X %*% theta - y)
  delta <- (t(X) %*% error) / length(y)
  theta <- theta - alpha * delta
  cost_history[i] <- cost(X, y, theta)
  theta_history[[i]] <- theta
}

print(theta)
     [,1]
[1,] -34.670410
[2,]   9.102076

Difference in Theta values between gradient descent and linear model in R

Answers (1)

Related Questions