Reputation: 218
I am trying to code logistic regression from scratch. In this code I have, I thought my cost derivative was my regularization, but I've been tasked with adding L1norm regularization. How do you add this in python? Should this be added where I have defined the cost derivative? Any help in the right direction is appreciated.
def Sigmoid(z):
return 1/(1 + np.exp(-z))
def Hypothesis(theta, X):
return Sigmoid(X @ theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, X)
_y = Y.reshape(-1, 1)
J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
return J
def Cost_Function_Derivative(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T @ (hi - _y)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta - Cost_Function_Derivative(X,Y,theta,m,alpha)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = Y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('LR Accuracy: ', my_accuracy, "%")
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
print #('theta: ', theta)
print #('cost: ', Cost_Function(X,Y,theta,m))
Accuracy(theta)
ep = .012
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 10000
Logistic_Regression(X_train,Y_train,alpha,initial_theta,iterations)
Upvotes: 4
Views: 1635
Reputation: 579
either marked answer or code itself behaves strange when check:
import numpy as np
import pandas as pd
from scipy.special import expit
##e=0.2
def Sigmoid(z):
return expit(-z)
def Hypothesis(theta, X):
return Sigmoid(X @ theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, X)
_y = Y.reshape(-1, 1)
J = 1/m * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
## J = J + e * np.sum(abs(theta))
return J
def Cost_Function_Derivative(X,Y,theta,m):
h = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = 1/m * X.T @ (h - _y)
## J = J + alpha * e * (theta >= 0).astype(float)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta - alpha * Cost_Function_Derivative(X,Y,theta,m)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('hand-maded LR Accuracy: ', my_accuracy, "%")
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
# update
theta = new_theta
if x % 100 == 0:
print #('theta: ', theta)
print #('cost: ', Cost_Function(X,Y,theta,m))
print(Accuracy(theta))
ep = .02
########## sklearn
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
X, y = make_blobs(1000, n_features=2, centers=2, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42
)
# fit(X_train)
sc = StandardScaler()
sc.fit(X)
X = sc.transform(X)
from sklearn.linear_model import LogisticRegression
model_lr = LogisticRegression( C=ep, penalty="l1", tol=0.01, solver="saga", random_state=10)
model_lr.fit(X_train, y_train)
# predict(X_test)
y_pred_lr = model_lr.predict(X_test)
print("sklearn Accuracy Score: ", accuracy_score(y_pred_lr, y_test)*100)
########### hand-made
initial_theta = np.random.rand(X_train.shape[1],1)
alpha = 0.2
iterations = 10000
Logistic_Regression(X_train,y_train,alpha,initial_theta,iterations)
# sklearn Accuracy Score: 95.45454545454545
# hand-maded LR Accuracy: 50.60606060606061 %
this implementation gives more comparable results: e.g.
Accuracy on test set by model at link : 94.01197604790418
Accuracy on test set by sklearn model : 95.20958083832335
p.s. backpropagation algorithm
Upvotes: 0
Reputation: 6499
Regularization adds a term to the cost function so that there is a compromise between minimize cost and minimizing the model parameters to reduce overfitting. You can control how much compromise you would like by adding a scalar e
for the regularization term.
So just add the L1 norm of theta to the original cost function:
J = J + e * np.sum(abs(theta))
Since this term is added to the cost function, then it should be considered when computing the gradient of the cost function.
This is simple since the derivative of the sum is the sum of derivatives. So now just need to figure out what is the derivate of the term sum(abs(theta))
. Since it is a linear term, then the derivative is constant. It is = 1 if theta >= 0, and -1 if theta < 0 (note there is a mathematical undeterminity at 0, but we don't care about it).
So in the function Cost_Function_Derivative
we add:
J = J + alpha * e * (theta >= 0).astype(float)
Upvotes: 4