Getting an ANN to learn to recognise an advantageous state in a game of draughts?

Question

As homework for university, we were given the task of creating a simple AI that could play a game of draughts using a minimax algorithm with alpha-beta pruning. What other techniques we used were up to us.

In our group, we decided to use an artificial neural network as the evaluation function (mostly for fun), such that, over time, it would learn to recognize board states that are good or bad for the AI. This indeed implies that we're noobs when it comes to AI.

It's a fairly simple feed-forward neural net with 50 input neurons (for 50 squares of the board), 25 hidden neurons in a single layer, and 1 output neuron.

Output varies between -1 and 1, with a higher value designating a state that was advantageous for the player, and a lower one being bad. It uses tanh as an activation function.

We used the back-propagation algorithm to make it learn.

Repeatedly presenting it with a set of inputs and desired outputs, it actually learned the patterns quite well!

We first put it through some initial training, where we generated random board states, and set the target value to be the ratio of "your" pieces to "opponent" pieces, always restricted to [-1,1]. The ANN appeared to also pick up this pattern quite well.

However, we also wanted it to learn from experience, where we could make it play a few hundred games, and have it learn from that. So, first, we put it through the "initial training", and then, after each game, in pseudocode, we did this:

learnRate = 0.05;
if ( the game was won ) {
    desireability_adjust = 0.1f;
} else {
    desireability_adjust = -0.1f;
}

foreach boardState in (board states recorded during play in reverse order) {

    currentOutput = neuralnet.evaluate(boardState)

    targetOutput = currentOutput + desireability_adjust * (1 - abs(currentOutput))

    neuralnet.learn(boardState, targetOuput, learnRate)

    // Decrease the desireability_adjust for every step
    desireability_adjust *= 0.9; 
}

This gave very mixed results. The "experienced" players regularly lost against those who just got out of initial training, and the classic minimax player that simply counted pieces simply slaughtered the NN-driven ones.

What changes to this approach would you advise get the NN to learn the game? Would making the network bigger or deeper do anything? Is there something obviously bad about the post-game learning algorithm?

EDIT:

I changed the post-game algorithm to this, instead of the "adjustment" method proposed above:

learnRate = 0.1f;

if (win) {
    resultValue = 0.95 // Not 1 because tanh cannot reach 1 or -1
} else {
    resultValue = -0.95
}

// 0-based index is assumed
recordedstates = (list of states recorded during the game, in order)

for (index from recordedstates.length - 1 to 0) { {

    t = (index / recordedstates.length - 1);

    targetOutput = t * resultValue;

    nn.learn(recordedstates[index], targetOutput, learnRate)
}

The intuition is that the target value represents a "progress to the result", where a value of near 0 means the game is near the initial state, and -0.95 and 0.95 mean being close to either a loss or a win, respectively.

The homework is mostly about minimax with alpha-beta anyway, and we're going to have a competition to see who made the best AI. We're probably going to lose badly, but it will probably yield a large number of very strong opponent players to train with. Neural nets fascinate me, so I'll probably keep working on it.

What I also noticed was that some networks that lost often became "depressed", where they would report just about any state as being negative. Hopefully the new learning method above will tend to do that a lot less.

EDIT:2 I also added a few unit test cases. Most notably this one:

int[] hitPositionNearWin = {0,
              EMPTY,      EMPTY,      EMPTY,      EMPTY,      EMPTY,
        EMPTY,      EMPTY,      EMPTY,      EMPTY,      EMPTY,
              EMPTY,      EMPTY,      EMPTY,      EMPTY,      EMPTY,
        EMPTY,      EMPTY,      BLACKPIECE, EMPTY,      EMPTY,
              EMPTY,      EMPTY,      WHITEPIECE, EMPTY,      EMPTY,
        EMPTY,      EMPTY,      EMPTY,      EMPTY,      EMPTY,
              EMPTY,      EMPTY,      EMPTY,      EMPTY,      EMPTY,
        EMPTY,      EMPTY,      EMPTY,      EMPTY,      EMPTY,
              EMPTY,      EMPTY,      EMPTY,      EMPTY,      EMPTY,
        EMPTY,      EMPTY,      EMPTY,      EMPTY,      EMPTY,
    };

What I expect is that the neural net returns 1 from both the black and the white player's perspective, because the board is always evaluated from the perspective of the player who is about to make a move, and whoever gets to make a move here wins.

At the moment, the neural net gives:

One-hit win for white: -0.04490412 (Expected near 1)
One-hit win for black: 0.08066118 (Also expected near 1)

Is it correct that a 3-layer net (1 hidden, 1 input, 1 output) should be able to recognize such a situation? (Regardless of where it occurs on the board.)

Having only a single piece left gives scores closer to the extremes, as that is a winning position, which is correct.

Getting an ANN to learn to recognise an advantageous state in a game of draughts?

Answers (1)

Related Questions