RACHEL
RACHEL

Reputation: 67

Multiclass logistic regression from scratch

I’m trying to apply multiclass logistic regression from scratch. The dataset is the MNIST.

I built some functions such as hypothesis, sigmoid, cost function, cost function derivate, and gradient descendent. My code is below.

I’m struggling with:

As all images are labeled with the respective digit that they represent. There are a total of 10 classes.

Inside the function gradient descendent, I need to loop through each class, but I do not know how to apply it using the One vs All method.

In other words, what I need to do are:

Here is my code.

import numpy as np
import pandas as pd


# Only training data set
# the test data will be load later.

url='https://drive.google.com/file/d/1-MO8oCfq4KU361QeeL4DdafVBhZePUNT/view?usp=sharing'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
df = pd.read_csv(url,header = None)

X = df.values[:, 0:-1]
y = df.values[:, -1]

m = np.size(X, 0)

y = np.array(y).reshape(m, 1)
X = np.c_[ np.ones(m), X ] # Bias


def hypothesis(X, thetas):
    return sigmoid( X.dot(thetas)) #- 0.0000001 

def sigmoid(z):
    return 1/(1+np.exp(-z))

def losscost(X, y, m, thetas):
    h = hypothesis(X, thetas)
    return -(1/m) * ( y.dot(np.log(h)) + (1-y).dot(np.log(1-h)) )


def derivativelosscost(X, y, m, thetas):
    h = hypothesis(X, thetas)  
    return (h-y).dot(X)/m

def descendinggradient(X, y, m, epoch, alpha, thetas):
    
    n = np.size(X, 1)
    J_historico = []

    for i in range(epoch):

        for j in range(0,10):  # 10 classes

            # How to filter each class inside here (inside this def descendinggradient)?
        
            # 2 lines below are wrong.
            #thetas = thetas - alpha * derivativelosscost(X, y, m, thetas)
            #J_historico = J_historico + [losscost(X, y, m, thetas)]
    
    return [thetas, J_historico]


alpha = 0.01
epoch = 50
(thetas, J_historico) = descendinggradient(X, y, m, epoch, alpha)

# After that, how to build a function to predict the test set.

Upvotes: 0

Views: 587

Answers (1)

MaKaNu
MaKaNu

Reputation: 1008

Let me explain this problem step-by-step:

First since you code doesn't provides the actual data or a link to it I've created a random dataset followed by the same commands you used to create X and Y:

batch_size = 20
num_classes = 10


rng = np.random.default_rng(seed=42)
df = pd.DataFrame(
    4* rng.random((batch_size, num_classes + 1)) - 2, # Create Random Array Between -2, 2
    columns=['X0','X1','X2','X3','X4','X5','X6','X7','X8', 'X9','Y']
)


X = df.values[:, 0:-1]
y = df.values[:, -1]

m = np.size(X, 0)

y = np.array(y).reshape(m, 1)
X = np.c_[ np.ones(m), X ] # Bias

Next lets take a look at your hypothesis function. If we would just run hypothesis and take a look at the first sample, we will get a vector with the size (10,1). I also needed to provide the initial thetas for this case:

thetas = rng.random((X.shape[1],num_classes))

h = hypothesis(X, thetas)

print(h[0])

>>>[0.89701729 0.90050806 0.98358408 0.81786334 0.96636732 0.97819512
 0.89118488 0.87238045 0.70612173 0.30256924]

Basically the function calculates a "propabilties"[1] for each class.

At this point we got to the first issue in your code. The result of the sigmoid function returns "propabilities" which are not "connected" to each other. So to set those "propabilties" in relation we need a another function: SOFTMAX. You will find plenty implementations about this functions. In short: It will calculate the "propabilites" based on the "sigmoid", so that the sum overall class-"propabilites" results to 1.

So for your second question "How to implement a predict after training", we only need to find the argmax value to determine the class:

h = hypothesis(X, thetas)
p = softmax(h) # needs to be implemented
prediction = np.argmax(p, axis=1)
print(prediction)

>>>[2 5 5 8 3 5 2 1 3 5 2 3 8 3 3 9 5 1 1 8]

Now that we know how to predict a class, we also need to know where to setup the training. We want to do this directly after the softmax function. But instead of using the argmax to determine the winning class, we use the costfunction and its derivative. Your problem in your code: You used the crossentropy loss for a binary problem. The binary problem also don't need to use the softmax function, because the sigmoid function already provides the connection of the two binary classes. So since we are not interested in the result at all of the cross-entropy-loss for multiple classes and only into its derivative, we also want to calculate this directly.

The conversion from binary crossentropy to multiclass is kind of unintuitive in the first view. I recommend to read a bit about it before implementing. After this you basicly use your line:

thetas = thetas - alpha * derivativelosscost(X, y, m, thetas)

for updating the thetas.

[1]These are not actuall propabilities, but this is a complete different topic.

Upvotes: 1

Related Questions