Reputation: 131228
When we define a deep learning model, we do the following steps:
It looks to me that in MXNet the first two steps are bound. For example, in the following way I define a linear transformation:
# declare a symbolic variable for the model's input
inp = mx.sym.Variable(name = 'inp')
# define how output should be determined by the input
out = mx.sym.FullyConnected(inp, name = 'out', num_hidden = 2)
# specify input and model's parameters
x = mx.nd.array(np.ones(shape = (5,3)))
w = mx.nd.array(np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]))
b = mx.nd.array(np.array([7.0, 8.0]))
# calculate output based on the input and parameters
p = out.bind(ctx = mx.cpu(), args = {'inp':x, 'out_weight':w, 'out_bias':b})
print(p.forward()[0].asnumpy())
Now, if I want to add a SoftMax transformation on top of it, I need to do the following:
# define the cost function
target = mx.sym.Variable(name = 'target')
cost = mx.symbol.SoftmaxOutput(out, target, name='softmax')
y = mx.nd.array(np.array([[1.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0]]))
c = cost.bind(ctx = mx.cpu(), args = {'inp':x, 'out_weight':w, 'out_bias':b, 'target':y})
print(c.forward()[0].asnumpy())
What I do not understand, is why do we need to create the symbolic variable target
. We would need it only if we want to calculate costs, but so far, we just calculate output based on the input (by doing a linear transformation and SoftMax).
Moreover, we need to provide a numerical value for the target to get the output calculated. So, it looks like it is required but it is not used (the provided value of the target does not change the value of the output).
Finally, we can use the cost
object to define a model which we can fit as soon as we have data. But what about the cost function? It has to be specified, but it is not. Basically, it looks like I am forced to use a specific cost bunction just because I use SoftMax. But why?
ADDED
For more statistical / mathematical point of view check here. Although the current question is more pragmatic / programmatic in nature. It is basically: How to decouple the output nonlinearity and the cost function in MXNEt. For example I might want to do a linear transformation and then find the model parameters by minimizing absolute deviation instead of squared one.
Upvotes: 2
Views: 123
Reputation: 980
You can use mx.sym.softmax()
if you only want softmax. mx.sym.SoftmaxOutput()
contains efficient code for calculating gradient of cross-entropy (negative log loss), which is the most common loss used with softmax. If you want to use your own loss, just use softmax and add a loss on top during training. I should note that you could also replace the SoftmaxOutput
layer with a simple softmax
during inference if you really want to.
Upvotes: 3