Reputation: 377
I am currently working on creating a multi class classifier using numpy and finally got a working model using softmax as follows:
class MultinomialLogReg:
def fit(self, X, y, lr=0.00001, epochs=1000):
self.X = self.norm_x(np.insert(X, 0, 1, axis=1))
self.y = y
self.classes = np.unique(y)
self.theta = np.zeros((len(self.classes), self.X.shape[1]))
self.o_h_y = self.one_hot(y)
for e in range(epochs):
preds = self.probs(self.X)
l, grad = self.get_loss(self.theta, self.X, self.o_h_y, preds)
if e%10000 == 0:
print("epoch: ", e, "loss: ", l)
self.theta -= (lr*grad)
return self
def norm_x(self, X):
for i in range(X.shape[0]):
mn = np.amin(X[i])
mx = np.amax(X[i])
X[i] = (X[i] - mn)/(mx-mn)
return X
def one_hot(self, y):
Y = np.zeros((y.shape[0], len(self.classes)))
for i in range(Y.shape[0]):
to_put = [0]*len(self.classes)
to_put[y[i]] = 1
Y[i] = to_put
return Y
def probs(self, X):
return self.softmax(np.dot(X, self.theta.T))
def get_loss(self, w,x,y,preds):
m = x.shape[0]
loss = (-1 / m) * np.sum(y * np.log(preds) + (1-y) * np.log(1-preds))
grad = (1 / m) * (np.dot((preds - y).T, x)) #And compute the gradient for that loss
return loss,grad
def softmax(self, z):
return np.exp(z) / np.sum(np.exp(z), axis=1).reshape(-1,1)
def predict(self, X):
X = np.insert(X, 0, 1, axis=1)
return np.argmax(self.probs(X), axis=1)
#return np.vectorize(lambda i: self.classes[i])(np.argmax(self.probs(X), axis=1))
def score(self, X, y):
return np.mean(self.predict(X) == y)
And had several questions:
Is this a correct mutlinomial logistic regression implementation?
It takes 100,000 epochs using learning rate 0.1 for the loss to be 1 - 0.5 and to get an accuracy of 70 - 90 % on the test set. Would this be considered bad performance?
What are some ways for improving performance or speeding up training (to need less epochs)?
I saw this cost function online which gives better accuracy, it looks like cross-entropy, but it is different from the equations of cross-entropy optimization I saw, can someone explain how the two differ:
error = preds - self.o_h_y
grad = np.dot(error.T, self.X)
self.theta -= (lr*grad)
Upvotes: 1
Views: 963
Reputation: 86
np.linalg.norm(grad) < 1e-8
.A question for you: When you evaluate your test set, are you preprocessing them the same way you do the training set in your fit function?
Upvotes: 1