Reputation: 3
I was using Newmu tutorial for logistic regression from his github. Wanted to add one hidden layer to his model, so I divided weights variable into two arrays h_w and o_w. The problem is - when I'm trying to make an update its impossible to operate on the list (w = [h_w, o_w])
"File "C:/Users/Dis/PycharmProjects/untitled/MNISTnet.py",
line 32, in <module>
**update = [[w, w - gradient * 0.05]] TypeError: can't multiply sequence by non-int of type 'float'**"
I'm beginner in theano and numpy and theano documentation couldn't help me. I've found stack() function, but when combining w = T.stack([h_w, o_w], axis=1)
theano gives me error:
Traceback (most recent call last):
File "C:\Users\Dis\PycharmProjects\untitled\MNISTnet.py", line 35, in <module>
gradient = T.grad(cost=cost, wrt=w)
File "C:\Program Files\Anaconda2\lib\site-packages\theano-0.9.0.dev1-py2.7.egg\theano\gradient.py", line 533, in grad
handle_disconnected(elem)
File "C:\Program Files\Anaconda2\lib\site-packages\theano-0.9.0.dev1-py2.7.egg\theano\gradient.py", line 520, in handle_disconnected
raise DisconnectedInputError(message)
theano.gradient.DisconnectedInputError:
Backtrace when that variable is created:
File "C:\Users\Dis\PycharmProjects\untitled\MNISTnet.py", line 30, in <module>
w = T.stack([h_w, o_w], axis=1)
So, my question: how can I convert that list [<TensorType(float64, matrix)>, <TensorType(float64, matrix)>]
to variable <TensorType(float64, matrix)>
?
My full code below:
import theano
from theano import tensor as T
import numpy as np
from load import mnist
def floatX(X):
return np.asarray(X, dtype=theano.config.floatX)
def init_weights(shape):
return theano.shared(floatX(np.random.randn(*shape) * 0.01))
def model(X, o_w, h_w):
hid = T.nnet.sigmoid(T.dot(X, h_w))
out = T.nnet.softmax(T.dot(hid, o_w))
return out
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
h_w = init_weights((784, 625))
o_w = init_weights((625, 10))
py_x = model(X, o_w, h_w)
y_pred = T.argmax(py_x, axis=1)
w = [o_w, h_w]
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
gradient = T.grad(cost=cost, wrt=w)
print type(gradient)
update = [[w, w - gradient * 0.05]]
Upvotes: 0
Views: 136
Reputation: 1201
T.grad(..)
returns gradient w.r.t to each parameter, so you cannot do [w, w - gradient * 0.05]
, you have to specify which gradient[*] parameter you are referring to. Also it's not a good idea to use stack for multiple parameters, simple list is good enough, check this tutorial.
This should work:
import theano
from theano import tensor as T
import numpy as np
from load import mnist
def floatX(X):
return np.asarray(X, dtype=theano.config.floatX)
def init_weights(shape):
return theano.shared(floatX(np.random.randn(*shape) * 0.01))
def model(X, o_w, h_w):
hid = T.nnet.sigmoid(T.dot(X, h_w))
out = T.nnet.softmax(T.dot(hid, o_w))
return out
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
h_w = init_weights((784, 625))
o_w = init_weights((625, 10))
py_x = model(X, o_w, h_w)
y_pred = T.argmax(py_x, axis=1)
w = [o_w, h_w]
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
gradient = T.grad(cost=cost, wrt=w)
print type(gradient)
update = [[o_w, o_w - gradient[0] * 0.05],
[h_w, h_w - gradient[1] * 0.05]]
I suggest going through Theano tutorials to get started.
Upvotes: 1