Reputation: 43
I am trying to implement gradient descent algorithm in Python. When I plot the history of the cost function it seems to be converging but the mean absolute error I get with my implementation is way worse than the one I get from sklearn's linear_model. I couldn't figure out what is wrong with my implementation.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
def gradient_descent(x, y, theta, alpha, num_iters):
m = len(y)
cost_history = np.zeros(num_iters)
for iter in range(num_iters):
h = np.dot(x, theta)
for i in range(len(theta)):
theta[i] = theta[i] - (alpha/m) * np.sum((h - y) * x[:,i])
#save the cost in every iteration
cost_history[iter] = np.sum(np.square((h - y))) / (2 * m)
return theta, cost_history
attributes = [...]
class_field = [...]
x_df = pd.read_csv('train.csv', usecols = attributes)
y_df = pd.read_csv('train.csv', usecols = class_field)
#normalize
x_df = (x_df - x_df.mean()) / x_df.std()
#gradient descent
alpha = 0.01
num_iters = 1000
err = 0
i = 10
for i in range(i):
x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2)
x_train = np.array(x_train)
y_train = np.array(y_train).flatten()
theta = np.random.sample(len(x_df.columns))
theta, cost_history = gradient_descent(x_train, y_train, theta, alpha, num_iters)
err = err + mean_absolute_error(y_test, np.dot(x_test, theta))
print(np.dot(x_test, theta))
#plt.plot(cost_history)
#plt.show()
print(err/i)
regr = linear_model.LinearRegression()
regr.fit(x_train, y_train)
y_pred = regr.predict(x_test)
print(mean_absolute_error(y_test, y_pred))
Upvotes: 3
Views: 1135
Reputation: 210832
It seems you have missed a bias / intercept column and coefficient.
Hypothesis for linear function should look like:
H = theta_0 + theta_1 * x
in your implementation it looks like as follows:
H = theta_1 * x
Upvotes: 3