Reputation: 79
been trying to implement a Q deep learning algorithm, having an issue though, its not working, after 100 000 game plays and using 1000 iterations to train each step (although i have tried lower numbers for both) it's still not learning. Network and game are in the linked image, https://i.sstatic.net/PaC35.jpg here is what happens in each training step:
double maxQval;
double[] inputvec;
int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board
double[] QtarVec = new double[] { 0, 0, 0, 0 };
double r = GetR((int)state[0], (int)state[1]); // GetR is reward
QtarVec[MaxQ] = Qtar(r, maxQval); // backprop vector of 0's except Qtar replaces a value
associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });
Training data pair for backprop is (input i linked in image,QTarget = r + gamma * MaxQ) , MaxQ is max network output layer activation or a random one (epsilon greedy). r is reward obtained from each move, -10 for obstacle and 10 for goal. (althogh I have tried just 10 for goal and 0 for everything else. Here is training code.
public void Train(int nTrails)
{
double[] state = new double[] { 1, 1 }; // inital position
int its = 0;
for (int i = 0; i < nTrails; i++)
{
while (((state[0] < 4) && (state[1] < 4))&&((state[0] * 100 >0) && (state[1] * 100 >0)) && (state[0] != 3 && state[1] != 3))//while on board and not at goal postion
{
double temp = r.NextDouble();
int next = -1;
lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100)));
if (temp < epsilon)
{
next = TrainRandIt(state); // move random direction, backprop
}
else
{
next = TrainMaxIt(state); // move in max activation direction, backprop
}
if (next == 0) .//updating postion
{
state[0]++;
}
else if (next == 1)
{
state[0]--;
}
else if (next == 2)
{
state[1]++;
}
else if (next == 3)
{
state[1]--;
}
}
}
state[0] = 1;
state[1] = 1; // resetting game
}
Any Help appreciated.
Upvotes: 2
Views: 561
Reputation: 152
Judging from the linked image you provided, it is just like a maze game where you have inputs for the player's position and the output as the direction the player should move to (up, down, left or right).
Here is a machine learning engine which is able to solve exactly that and more - the Ryskamp Learning Machine (RLM). The RLM has a different approach compared to the typical machine learning engines that you may have tried so far so I suggest you go to the link I've provided to learn more about it and what makes it different.
It is written in C# and we have an example of the Maze game just like the one you are trying out which you can browse through our Github page or even try it yourself by cloning/downloading the source code together with the examples apps provided.
For documentation, you may refer to the Documentations files provided or even through the github wiki.
The RLM is also available via Nuget.
Upvotes: 1