Reputation: 742
I am following Andrew Ng's Coursera course and I am trying to write a basic python implementation of gradient descent using the housing data that I believe he also used in the slides (it can be found here). I am not using numpy or scikit learn or anything and I am just trying to get code working with a 1D input and output with a line of the form theta0 + theta1 * x
(2 variables). My code is very simple but yet even if I increase or decrease the learning rate or let it run for more iterations, it still manages to diverge. I have looked over and tried multiple other formulas and it still diverges. I have made sure the data loads up properly. Here is the code:
dataset_f = open("housing_prices.csv", "r")
dataset = dataset_f.read().split("\n")
xs = []
ys = []
for line in dataset:
split = line.split(",")
xs.append(int(split[0]))
ys.append(int(split[2]))
m = float(len(xs))
learning_rate = 1e-5
theta0 = 0
theta1 = 0
n_steps = 1
def converged():
return n_steps > 1000
while not converged():
print("Step #" + str(n_steps))
print("θ Naught: {}".format(theta0))
print("θ One: {}".format(theta1))
theta0_gradient = (1.0 / m) * sum([(theta0 + theta1 * xs[i] - ys[i]) for i in range(int(m))])
theta1_gradient = (1.0 / m) * sum([(theta0 + theta1 * xs[i] - ys[i]) * xs[i] for i in range(int(m))])
theta0_temp = theta0 - learning_rate * theta0_gradient
theta1_temp = theta1 - learning_rate * theta1_gradient
theta0 = theta0_temp
theta1 = theta1_temp
n_steps += 1
print(theta0)
print(theta1)
Theta naught and one very quickly become nan
because they go to infinity. What I did notice is that both theta naught and one oscillate between positive and negative and get increasingly bigger. For example:
Step #1
θ Naught: 0
θ One: 0
Step #2
θ Naught: 3.4041265957446813
θ One: 7642.091281914894
Step #3
θ Naught: -146.0856377478662
θ One: -337844.5760108272
Step #4
θ Naught: 6616.511688310662
θ One: 15281052.424862152
Step #5
θ Naught: -299105.2400554526
θ One: -690824180.132845
Step #6
θ Naught: 13522088.241560074
θ One: 31231058614.54401
Step #7
θ Naught: -611311852.8608981
θ One: -1411905961438.4395
Step #8
θ Naught: 27636426469.18927
θ One: 63829999475126.086
Step #9
θ Naught: -1249398426624.6619
θ One: -2885651696197370.0
Step #10
θ Naught: 56483294981582.41
θ One: 1.304556757051869e+17
Step #11
θ Naught: -2553518992810967.5
θ One: -5.89769144561785e+18
Step #12
θ Naught: 1.1544048994968486e+17
θ One: 2.6662515218056607e+20
Step #13
θ Naught: -5.218879028251596e+18
θ One: -1.2053694641507752e+22
Upvotes: 1
Views: 2201
Reputation: 1578
I've gotten your code working with some minor changes. Ignore the imports I have, that was purely for my own plotting purposes. This one should use your new dataset. The main change was simply adjusting the learning rates and removing a few unnecessary casts.
import matplotlib.pyplot as plt
import numpy as np
dataset_f = open("actual_housing_prices.csv", "r")
dataset = dataset_f.read().split("\n")
xs = []
ys = []
for line in dataset:
split = line.split(",")
xs.append(int(split[0]))
ys.append(int(split[2]))
m = len(xs)
learning_rate1 = 1e-7
learning_rate2 = 1e-3
theta0 = 0
theta1 = 0
n_steps = 1
def converged():
return n_steps > 100000
while not converged():
print("Step #" + str(n_steps))
print("Theta Naught: {}".format(theta0))
print("Theta One: {}".format(theta1))
theta0_gradient = (1.0 / m) * sum([theta0 + theta1*xs[i] - ys[i] for i in range(m)])
theta1_gradient = (1.0 / m) * sum([(theta0 + theta1*xs[i] - ys[i])* xs[i] for i in range(m)])
theta0_temp = theta0 - learning_rate2 * theta0_gradient
theta1_temp = theta1 - learning_rate1 * theta1_gradient
theta0 = theta0_temp
theta1 = theta1_temp
n_steps += 1
print(theta0)
print(theta1)
print("Error: {}".format(sum([ys[i]-theta0+theta1*xs[i] for i in range(m)])))
plt.plot(xs, ys, 'ro')
plt.axis([0, max(xs), 0, max(ys)])
my_vals = list(np.arange(0, max(xs), 0.02))
plt.plot(my_vals, map(lambda q: theta0+theta1*q, my_vals), '-bo')
plt.show()
Here's the resulting line using the two optimized weights:
Upvotes: 1