Reputation: 17154
I would to calculate the cost for the softmax regression. The cost function to calculate is given at the bottom of the page.
For numpy I can get the cost as follows:
"""
X.shape = 2,300 # floats
y.shape = 300, # integers
W.shape = 2,3
b.shape = 3,1
"""
import numpy as np
np.random.seed(100)
# Data and labels
X = np.random.randn(300,2)
y = np.ones(300)
y[0:100] = 0
y[200:300] = 2
y = y.astype(np.int)
# weights and bias
W = np.random.randn(2,3)
b = np.random.randn(3)
N = X.shape[0]
scores = np.dot(X, W) + b
hyp = np.exp(scores-np.max(scores, axis=0, keepdims=True))
probs = hyp / np.sum(hyp, axis = 0)
logprobs = np.log(probs[range(N),y])
cost_data = -1/N * np.sum(logprobs)
print("hyp.shape = {}".format(hyp.shape)) # hyp.shape = (300, 3)
print(cost_data)
But, when I tried torch I could not get this. So far I have got this:
"""
X.shape = 2,300 # floats
y.shape = 300, # integers
W.shape = 2,3
b.shape = 3,1
"""
import numpy as np
import torch
from torch.autograd import Variable
np.random.seed(100)
# Data and labels
X = np.random.randn(300,2)
y = np.ones(300)
y[0:100] = 0
y[200:300] = 2
y = y.astype(np.int)
X = Variable(torch.from_numpy(X),requires_grad=True).type(torch.FloatTensor)
y = Variable(torch.from_numpy(y),requires_grad=True).type(torch.LongTensor)
# weights and bias
W = Variable(torch.randn(2,3),requires_grad=True)
b = Variable(torch.randn(3),requires_grad=True)
N = X.shape[0]
scores = torch.mm(X, W) + b
hyp = torch.exp(scores - torch.max(scores))
probs = hyp / torch.sum(hyp)
correct_probs = probs[range(N),y] # got problem HERE
# logprobs = np.log(correct_probs)
# cost_data = -1/N * torch.sum(logprobs)
# print(cost_data)
I got problem calculating the correct probabilities for the classes.
How can we solve this problem and get the correct cost value.
The cost function to calculate is given below:
Upvotes: 1
Views: 738
Reputation: 18693
Your problem is that you cannot use range(N)
with pytorch
, use the slice 0:N
instead:
hyp = torch.exp(scores - torch.max(scores))
probs = hyp / torch.sum(hyp)
correct_probs = probs[0:N,y] # problem solved
logprobs = torch.log(correct_probs)
cost_data = -1/N * torch.sum(logprobs)
Another point is that your labels y
do not require gradients, you would better have:
y = Variable(torch.from_numpy(y),requires_grad=False).type(torch.LongTensor)
Upvotes: 3