Sherlock
Sherlock

Reputation: 5627

Multiple Output Neural Network

I have built my first neural network in python, and i've been playing around with a few datasets; it's going well so far !

I have a quick question regarding modelling events with multiple outcomes: -

Say i wish to train a network to tell me the probability of each runner winning a 100m sprint. I would give the network all of the relevant data regarding each runner, and the number of outputs would be equal to the number of runners in the race.

My question is, using a sigmoid function, how can i ensure the sum of the outputs will be equal to 1.0 ? Will the network naturally learn to do this, or will i have to somehow make this happen explicitly ? If so, how would i go about doing this ?

Many Thanks.

Upvotes: 13

Views: 18519

Answers (3)

Vivin Paliath
Vivin Paliath

Reputation: 95498

The output from your neural network will approach 1. I don't think it will actually get to 1.

You actually don't need to see which output is equal to 1. Once you've trained your network up to a specific error level, when you present the inputs, just look for the maximum output in your output later. For example, let's say your output layer presents the following output: [0.0001, 0.00023, 0.0041, 0.99999412, 0.0012, 0.0002], then the runner that won the race is runner number 4.

So yes, your network will "learn" to produce 1, but it won't exactly be 1. This is why you train to within a certain error rate. I recently created a neural network to recognize handwritten digits, and this is the method that I used. In my output layer, I have a vector with 10 components. The first component represents 0, and the last component represents 9. So when I present a 4 to the network, I expect the output vector to look like [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]. Of course, it's not what I get exactly, but it's what I train the network to provide. So to find which digit it is, I simply check to see which component has the highest output or score.

Now in your second question, I believe you're asking how the network would learn to provide the correct answer? To do this, you need to provide your network with some training data and train it until the output is under a certain error threshold. So what you need is a set of data that contains the inputs and the correct output. Initially your neural network will be set up with random weights (there are some algorithms that help you select better weights to minimize training time, but that's a little more advanced). Next you need a way to tell the neural network to learn from the data provided. So basically you give the data to the neural network and it provides an output, which is highly likely to be wrong. Then you compare that data with the expected (correct) output and you tell the neural network to update its weights so that it gets closer to the correct answer. You do this over and over again until the error is below a certain threshold.

The easiest way to do this is to implement the stochastic backpropagation algorithm. In this algorithm, you calculate the error between the actual output of the neural network and the expected output. Then you backpropagate the error from the output layer all the way up to the weights to the hidden layer, adjusting the weights as you go. Then you repeat this process until the error that you calculate is below a certain threshold. So during each step, you're getting closer and closer towards your solution.

You can use the algorithm described here. There is a decent amount of math involved, so be prepared for that! If you want to see an example of an implementation of this algorithm, you can take a look at this Java code that I have on github. The code uses momentum and a simple form of simulated annealing as well, but the standard backpropagation algorithm should be easily discernible. The Wikipedia article on backpropagation has a link to an implementation of the backpropagation algorithm in Python.

You're probably not going to understand the algorithm immediately; expect to spend some time understanding it and working through some of the math. I sat down with a pencil and paper as I was coding, and that's how I eventually understood what was going on.

Here are a few resources that should help you understand backpropagation a little better:

If you want some more resources, you can also take a look at my answer here.

Upvotes: 18

Rob Neuhaus
Rob Neuhaus

Reputation: 9290

Basically you want a function of multiple real numbers that converts those real numbers into probabilities (each between 0 to 1, sum to 1). You can this easily by post processing the output of your network.

Your network gives you real numbers r1, r2, ..., rn that increases as the probability of each runner wins the race.

Then compute exp(r1), exp(r2), ..., and sum them up for ers = exp(r1) + exp(r2) + ... + exp(rn). Then the probability that the first racer wins is exp(r1) / ers.

This is a one use of the Boltzman distribution. http://en.wikipedia.org/wiki/Boltzmann_distribution

Upvotes: 2

penelope
penelope

Reputation: 8419

Your network should work around that and learn it naturally eventually.

To make the network learn that a little faster, here's what springs to mind first:

  • add an additional output called 'sum' (summing all the other output neurons) -- if you want all the output neurons to be in an separate layer, just add a layer of outputs, first numRunners outputs just connect to corresponding neuron in the previous layer, and the last numRunners+1-th neuron you connect to all the neurons from the previous layer, and fix the weights to 1)

  • the training set would contain 0-1 vectors for each runner (did-did not run), and the "expected" result would be a 0-1 vector 00..00001000..01 first 1 marking the runner that won the race, last 1 marking the "sum" of "probabilities"

  • for the unknown races, the network would try to predict which runner would win. Since the outputs have contiguous values (more-or-less :D) they can be read as "the certainty of the network that the runner would win the race" -- which is what you're looking for

Even without the additional sum neuron, this is the rough description of the way the training data should be arranged.

Upvotes: 1

Related Questions