Reputation: 6427
I am trying to build my own logistic regression function using stochastic gradient descent in R, but what I have right now makes the weights grow without bound and therefore never halts:
# Logistic regression
# Takes training example vector, output vector, learn rate scalar, and convergence delta limit scalar
my_logr <- function(training_examples,training_outputs,learn_rate,conv_lim) {
# Initialize gradient vector
gradient <- as.vector(rep(0,NCOL(training_examples)))
# Difference between weights
del_weights <- as.matrix(1)
# Weights
weights <- as.matrix(runif(NCOL(training_examples)))
weights_old <- as.matrix(rep(0,NCOL(training_examples)))
# Compute gradient
while(norm(del_weights) > conv_lim) {
for (k in 1:NROW(training_examples)) {
gradient <- gradient + 1/NROW(training_examples)*
((t(training_outputs[k]*training_examples[k,]
/(1+exp(training_outputs[k]*t(weights)%*%as.numeric(training_examples[k,]))))))
}
# Update weights
weights <- weights_old - learn_rate*gradient
del_weights <- as.matrix(weights_old - weights)
weights_old <- weights
print(weights)
}
return(weights)
}
The function can be tested with the following code:
data(iris) # Iris data already present in R
# Dataset for part a (first 50 vs. last 100)
iris_a <- iris
iris_a$Species <- as.integer(iris_a$Species)
# Convert list to binary class
for (i in 1:NROW(iris_a$Species)) {if (iris_a$Species[i] != "1") {iris_a$Species[i] <- -1}}
random_sample <- sample(1:NROW(iris),50)
weights_a <- my_logr(iris_a[random_sample,1:4],iris_a$Species[random_sample],1,.1)
I double-checked my algorithm against Abu-Mostafa's, which is as follows:
gradient <- -1/N * sum_{1 to N} (training_answer_n * training_Vector_n / (1 + exp(training_answer_n * dot(weight,training_vector_n))))
weight_new <- weight - learn_rate*gradient
Am I missing something here?
Upvotes: 6
Views: 2870
Reputation: 6427
From a mathematical perspective, an unconstrained magnitude on the weight vector does not yield a unique solution. When I added these two lines to the classifier function, it converged in two steps:
# Normalize
weights <- weights/norm(weights)
...
# Update weights
weights <- weights_old - learn_rate*gradient
weights <- weights / norm(weights)
I couldn't make @SimonO101's work, and I'm not using this code for real work (there are builtins like glm
), so it's enough to do loops that I understand.
The whole function is as follows:
# Logistic regression
# Takes training example vector, output vector, learn rate scalar, and convergence delta limit scalar
my_logr <- function(training_examples,training_outputs,learn_rate,conv_lim) {
# Initialize gradient vector
gradient <- as.vector(rep(0,NCOL(training_examples)))
# Difference between weights
del_weights <- as.matrix(1)
# Weights
weights <- as.matrix(runif(NCOL(training_examples)))
weights_old <- as.matrix(rep(0,NCOL(training_examples)))
# Normalize
weights <- weights/norm(weights)
# Compute gradient
while(norm(del_weights) > conv_lim) {
for (k in 1:NCOL(training_examples)) {
gradient <- gradient - 1/NROW(training_examples)*
((t(training_outputs[k]*training_examples[k,]
/(1+exp(training_outputs[k]*t(weights)%*%as.numeric(training_examples[k,]))))))
}
# gradient <- -1/NROW(training_examples) * sum(training_outputs * training_examples / (1 + exp(training_outputs * weights%*%training_outputs) ) )
# Update weights
weights <- weights_old - learn_rate*gradient
weights <- weights / norm(weights)
del_weights <- as.matrix(weights_old - weights)
weights_old <- weights
print(weights)
}
return(weights)
}
Upvotes: 3
Reputation: 59970
There are a few problems. Firstly, you can make much better use of R's vectorisation methods. Secondly, I am not an expert in stochastic gradient descent, but the algorithm you give below your question does not correspond to how you calculate your gradients in the function. Check this code carefully, but it seems to converge, and I think it follows Abu-Mostfafa's. I gather that you want to calculate the gradient this;
gradient <- -1/N * sum(training_outputs * training_examples / (1 + exp(training_outputs * dot( weights ,training_outputs) ) ) )
So this part of your algorithm should read...
while(norm(del_weights) > conv_lim) {
gradient <- -1 / NROW(iris_a) * sum( training_outputs * training_examples / ( 1 + exp( training_outputs * as.matrix(training_examples) %*% weights ) ) )
# Update weights
weights <- weights_old - learn_rate*gradient
del_weights <- as.matrix(weights_old - weights)
weights_old <- weights
print(weights)
}
You can create a binary classification from the Species variable more easily using:
iris_a$Species <- as.numeric( iris_a$Species )
iris_a$Species[ iris_a$Species != 1 ] <- -1
I cannot tell you if the results returned are sensible, but that code should follow step 2. Check each step carefully, and remember R is vectorised so you can do element wise operations on vectors without loops. e.g.:
x <- 1:5
y <- 1:5
x*y
#[1] 1 4 9 16 25
Upvotes: 1