Reputation: 31
I have created a neural network with 2 inputs nodes, 4 hidden nodes and 3 output nodes. The initial weights are random between -1 to 1. I used backpropagation method to update the network with TD error. However, the performance is not good.
I want to know, where the problem might be?
1. Is a bias node necessary?
2. Are eligibility traces necessary?
If anyone can provide me any sample code, I'm very grateful.
Upvotes: 2
Views: 958
Reputation: 1347
Yes, you should include the bias nodes, and yes you should use eligibility traces. The bias nodes just give one additional tunable parameter. Think of the neural network as a "function approximator" as described in Sutton and Barto's book (free online). If the neural network has parameters theta (a vector containing all of the weights in the network), then the Sarsa update is just (using LaTeX notation):
\delta_t = r_t + \gamma*Q(s_{t+1},a_{t+1},\theta_t) - Q(s_t,a_t, \theta_t)
\theta_{t+1} = \theta_t + \alpha*\delta_t*\frac{\partial Q(s,a,\theta)}{\partial \theta}
This is for any function approximator Q(s,a,\theta), which estimates Q(s,a) by tuning its parameters, \theta.
However, I must ask why you're doing this. If you're just trying to get Q learning working really well, then you should use the Fourier Basis instead of a neural network:
http://all.cs.umass.edu/pubs/2011/konidaris_o_t_11.pdf
If you really want to use a neural network for RL, then you should use a natural actor-critic (NAC). NACs follow something called the "natural gradient," which was developed by Amari specifically to speed up learning using neural networks, and it makes a huge difference.
Upvotes: 2
Reputation: 365
We need more information. What is the problem domain. What are the inputs? What are the outputs?
RL can take a very long time to train and, depending on how you're training, can go from good to great to good to not-so-good during training. Therefore, you should plot the performance of your agent during learning, not just the end result.
You always should use bias nodes. Eligibility traces? Probably not.
Upvotes: 0