Error training a MLP using Chainer

Question

I am trying to train and test a simple multi-layer perceptron, exactly as in the first Chainer tutorial, but with my own dataset instead of MNIST. This is the code I'm using (mostly from the tutorial):

class MLP(Chain):
    def __init__(self, n_units, n_out):
        super(MLP, self).__init__()
        with self.init_scope():
            self.l1 = L.Linear(None, n_units)
            self.l2 = L.Linear(None, n_units)
            self.l3 = L.Linear(None, n_out)
    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y

X, X_test, y, y_test, xHeaders, yHeaders = load_train_test_data('xHeuristicData.csv', 'yHeuristicData.csv')

print 'dataset shape   X:', X.shape, '  y:', y.shape

model = MLP(100, 1)
optimizer = optimizers.SGD()
optimizer.setup(model)

train = tuple_dataset.TupleDataset(X, y)
test = tuple_dataset.TupleDataset(X_test, y_test)

train_iter = iterators.SerialIterator(train, batch_size=100, shuffle=True)
test_iter = iterators.SerialIterator(test, batch_size=100, repeat=False, shuffle=False)
updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (10, 'epoch'), out='result')

trainer.extend(extensions.Evaluator(test_iter, model))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(['epoch', 'main/accuracy', 'validation/main/accuracy']))
trainer.extend(extensions.ProgressBar())

trainer.run()

print 'Predicted value for a test example'
print model(X_test[0])

Instead of training and printing the predicted value, I get the following error at "trainer.run()":

dataset shape   X: (1003, 116)   y: (1003,)
Exception in main training loop: __call__() takes exactly 2 arguments (3 given)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/usr/local/lib/python2.7/dist-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/usr/local/lib/python2.7/dist-packages/chainer/training/updater.py", line 234, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/usr/local/lib/python2.7/dist-packages/chainer/optimizer.py", line 534, in update
    loss = lossfun(*args, **kwds)
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "trainHeuristicChainer.py", line 76, in 
    trainer.run()
  File "/usr/local/lib/python2.7/dist-packages/chainer/training/trainer.py", line 313, in run
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/usr/local/lib/python2.7/dist-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/usr/local/lib/python2.7/dist-packages/chainer/training/updater.py", line 234, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/usr/local/lib/python2.7/dist-packages/chainer/optimizer.py", line 534, in update
    loss = lossfun(*args, **kwds)
TypeError: __call__() takes exactly 2 arguments (3 given)

I have no clue about how to deal with the error. I have successfully trained similar networks using other frameworks, but I am interested in Chainer because it is PyPy-compatible.

A tgz with the files is available here: https://mega.nz/#!wwsBiSwY!g72pC5ZgekeMiVr-UODJOqQfQZZU3lCqm9Er2jH4UD8

Haruki Kirigaya · Accepted Answer

You are sending a tuple of (X, y) into the MLP, while the implemented __call__ accepts only an x.

You can modify the implementation into

class MLP(Chain):
    def __init__(self, n_units, n_out):
        super(MLP, self).__init__()
        with self.init_scope():
            self.l1 = L.Linear(None, n_units)
            self.l2 = L.Linear(None, n_units)
            self.l3 = L.Linear(None, n_out)
    def __call__(self, x, y):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        predict = self.l3(h2)
        loss = F.squared_error(predict, y)
        // or you can write it on your own as follows
        // loss = F.sum(F.square(predict - y))
        return loss

It may be different in chainer than other frameworks that by default the standard updater assumes __call__ to be the loss function. So the call model(X, y) will return the loss of the current mini-batch. That's why the chainer tutorial introduces another Classifier class to calculate the loss function and keep the MLP simple. Classifier is meaningful in MNIST but will not suit your task, so you are on your own to implement the loss function.

When you have finished training, you can just save the model instance (maybe by adding an extension of snapshot_object into the trainer).

To use the saved model, like in testing, you have to write another method in the class maybe named as test with the identical codes as your current __call__, which only has X input at hand and thus no other y is required.

Furthermore, if you do not like to add any extra method into MLP class, making it pure, you then need to write the updater on your own and compute the loss function more naturally. To inherit the standard one is easier, you may write it as follows,

class MyUpdater(chainer.training.StandardUpdater):
    def __init__(self, data_iter, model, opt, device=-1):
        super(MyUpdater, self).__init__(data_iter, opt, device=device)
        self.mlp = model

    def update_core(self):
        batch = self.get_iterator('main').next()
        x, y = self.converter(batch, self.device)
        predict = self.mlp(x)
        loss = F.squared_error(predict, y)
        self.mlp.cleargrads()
        loss.backward()
        self.get_iterator('main').update()

updater = MyUpdater(train_iter, model, optimizer)

Error training a MLP using Chainer

Answers (1)

Related Questions