Reputation: 49339
Up until know I only used neural networks to classify a single output, I set one output neuron for each class and check which neuron has the highest/lowest activation.
What I am trying to do is to detect a pattern and instead of outputting a single value (either class or activation value) I would like to output multiple values. eg,
[0,5 0,5 0,5] -> [0,5 0,5 0,5]
[1 1 1] -> [1 1 1]
[2 2 2] -> [-1 -1 -1]
So what I am wondering is can I use a network with 3 outputs and instead of checking activation, use all outputs as my output pattern?
Upvotes: 1
Views: 4091
Reputation: 15518
Yes, you can use a neural network with multiple outputs. Basically, you have two possibilities to do that:
Use a trivial decomposition, i.e. separate your training sets with respect to the responses and train three ANNs where each one has a single output. But I guess that is not what you are looking for.
Train a real multi-output neural network. In this case, for a two-hidden layer ANN, the neuron weights of the input layer are the same for each output, whereas the output layer weights are specific to each output. In this case, you have to combine the backpropagation procedure of the three outputs. In a simple approach, you could do it by subsequently applying one backpropagation iteration to each output, until you hopefully obtain convergence. In order to do that in a reasonable way, you probably have to scale your responses in an appropriate way (otherwise, one output might dominate the other ones).
So again, here is the basic procedure for three outputs:
Separate the training sets into three sets with each one having one response. Standardize each set.
Apply one backpropagation iteration to the first data set, then one to the second, and finally one to the third data set. For each, use the same input layer weights.
Repeat 2. until convergence (--however you define it. Should be similar to the one-dimensional output procedure.)
However, as mentioned, this is only one possibility out of a large variety of optimization methods.
EDIT: The above is an extension of the simple one-dimensional backpropagation procedure. The degrees of freedom here are
(i) the order with which the inputs are processed and the error terms are calculated, and
(ii) when an update of the hidden neurons is made.
The variant described above goes over tha data as [x_1, ..., x_N, y_1, ..., y_N, z_1, ..., z_N]
and updates after each step (this update scheme is often called Gauss-Seidel). The other extreme is to store the error terms and update only once after the complete set is processed. (The Gauss-Seidel version usually converges a bit faster).
Another variant -- which is probably most similar to standard backpropagation -- is to process the three dimensions of each datapoint, i.e. [x_1, y_1, z_1, ..., x_N, y_N, z_N]
, and update after each datapoint (i.e. after each third iteration). In effect, one makes one three-dimensional gradient update (which can be done as three times one-dimensional error evaluation due to the linear nature of the gradient).
Summarizing, one sees that there is a large variety of possible optimization schemes, which are all very similar and probably all lead to rather similar results.
As an alternative, you can also consider using an Extreme Learning Machine. Here, you have to train only the output weights whereas you choose the input weights randomly. By this, the multi-response case natually separates into three one-dimensional optimization problems.
Upvotes: 4