Gaurav Sahu
Gaurav Sahu

Reputation: 191

Two different cost in Logistic Regression cost function

I am implementing logistic Regression algorithm with two feature x1 and x2. I am writing the code of cost function in logistic regression.

def computeCost(X,y,theta):
    J =((np.sum(-y*np.log(sigmoid(np.dot(X,theta)))-(1-y)*(np.log(1-sigmoid(np.dot(X,theta))))))/m)
    return J

Here My X is the training set matrix, y is the output. the shape of X is (100,3) and shape of y is (100,) as determined by shape attribute of numpy library. my theta is initially contained all zero entry with shape (3,1). When I calculate cost with these parameters I got the cost 69.314. But it is incorrect. The correct Cost is 0.69314. Actually, I got this correct cost when I reshape my y vector as y = numpy.reshape(y,(-1,1)) . But I actually didn't get how this reshaping corrects my cost. Here m(numbers of the training set) is 100.

Upvotes: 0

Views: 877

Answers (1)

Anu
Anu

Reputation: 3440

First of all never in future simply dump your code! You post(code + explanation) should be as descriptive as it could be! (not verbose, nobody will read it). Here is what your code is doing! Please post readable code in future! Else it's hard to read & answer!

def computeCost(X,y,theta):
    '''
     Using Mean Absolute Error

     X:(100,3)
     y: (100,1)
     theta:(3,1)
     Returns 1D matrix of predictions
     Cost = ( log(predictions) + (1-labels)*log(1-predictions) ) / len(labels)
     '''
    m = len(y)
    # calculate the prediction
    predictions = sigmoid(np.dot(X,theta))

    # error for when label is of class1
    class1_cost= -y * np.log(predictions)
    # error for when label is of class1
    class2_cost= (1-y)*np.log(1-predictions)
    # total cost
    cost = class1_cost-class2_cost
    # averaging cost
    cost =cost.sum() / m
    return cost

You should first understand How dot product works in math and what shape of input you algorithm would take to give you the correct answer! Don't throw random shapes! Your feature_vector is of shape(100,3) which when multiplied by your theta which of shape(3,1) will output a prediction vector of shape (100,1).

Matrix multiplication: The product of an M x N matrix and an N x K matrix is an M x K matrix. The new matrix takes the rows of the 1st and columns of the 2nd

So, your y dimension should be in (100,1) shape and not (100,). Huge difference! One is [[3],[4],[6],[7],[9],...] and another [3,4,6,7,9,.....]. Your dimension should match for correct output!

A better way of asking the question would be, how to calculate error/cost in logistic regression using the correct dimensions of my labels.!

For additional understanding!

import numpy as np

label_type1= np.random.rand(100,1)
label_type2= np.random.rand(100,)
predictions= np.random.rand(100,1)
print(label_type1.shape, label_type2.shape, predictions.shape)

# When you mutiply (100,1) with (100,1) --> (100,1)
print((label_type1 * predictions).shape)

# When you do a dot product (100,1) with (100,1) --> Error, for which you have to take a transpose which isn't relavant to the context!
# print( np.dot(label_type1,predictions).shape) # error: shapes (100,1) and (100,1) not aligned: 1 (dim 1) != 100 (dim 0)
print( np.dot(label_type1.T,predictions).shape) # 
print('*'*5)

# When you mutiply (100,) with (100,1) --> (100,100) !
print((label_type2 * predictions).shape) # 

# When you  do a dot product (100,) with (100,1) --> (1,) !
print(np.dot(label_type2, predictions).shape) 
print('*'*5)

# what you are doin
label_type1_addDim = np.reshape(label_type2,(-1,1))
print(label_type1_transpose.shape)

So, coming straight to the point, What you wanna achieve is a cost with dim (100,1)! so either you do 1st which you aren't doing! or you do the fifth, where you unknowingly adding a dimension to your y making it from (100,) to (100,1) and doing the same * operation as of 1st case! to get dim (100,1).

Upvotes: 2

Related Questions