idk
idk

Reputation: 71

Minimax with alpha-beta pruning problems

I'm making a C++ program for the game chopsticks.

It's a really simple game with only 625 total game states (and it's even lower if you account for symmetry and unreachable states). I have read up minimax and alpha-beta algorithms, mostly for tic tac toe, but the problem I was having was that in tic tac toe it's impossible to loop back to a previous state while that can easily happen in chopsticks. So when running the code it would end up with a stack overflow.

I fixed this by adding flags for previously visited states (I don't know if that's the right way to do it.) so that they can be avoided, but now the problem I have is that the output is not symmetric as expected.

For example in the start state of the game each player has one finger so it's all symmetric. The program tells me that the best move is to hit my right hand with my left but not the opposite.

My source code is -

#include <iostream>
#include <array>
#include <vector>
#include <limits>
std::array<int, 625> t; //Flags for visited states.
std::array<int, 625> f; //Flags for visited states.
int no = 0; //Unused. For debugging.
class gamestate
{
public:
    gamestate(int x, bool t) : turn(t) //Constructor.
    {
        for (int i = 0; i < 2; i++)
            for (int j = 0; j < 2; j++) {
                val[i][j] = x % 5;
                x /= 5;
            }
        init();
    }
    void print() //Unused. For debugging.
    {
        for (int i = 0; i < 2; i++) {
            for (int j = 0; j < 2; j++)
                std::cout << val[i][j] << "\t";
            std::cout << "\n";
        }
        std::cout << "\n";
    }
    std::array<int, 6> canmove = {{ 1, 1, 1, 1, 1, 1 }}; //List of available moves.
    bool isover() //Is the game over.
    {
        return ended;
    }
    bool won() //Who won the game.
    {
        return winner;
    }
    bool isturn() //Whose turn it is.
    {
        return turn;
    }
    std::vector<int> choosemoves() //Choose the best possible moves in the current state.
    {
        std::vector<int> bestmoves;
        if(ended)
            return bestmoves;
        std::array<int, 6> scores;
        int bestscore;
        if(turn)
            bestscore = std::numeric_limits<int>::min();
        else
            bestscore = std::numeric_limits<int>::max();
        scores.fill(bestscore);
        for (int i = 0; i < 6; i++)
            if (canmove[i]) {
                t.fill(0);
                f.fill(0);
                gamestate *play = new gamestate(this->playmove(i),!turn);
                scores[i] = minimax(play, 0, std::numeric_limits<int>::min(), std::numeric_limits<int>::max());
                std::cout<<i<<": "<<scores[i]<<std::endl;
                delete play;
                if (turn) if (scores[i] > bestscore) bestscore = scores[i];
                if (!turn) if (scores[i] < bestscore) bestscore = scores[i];
            }
        for (int i = 0; i < 6; i++)
            if (scores[i] == bestscore)
                bestmoves.push_back(i);
        return bestmoves;
    }
private:
    std::array<std::array<int, 2>, 2 > val; //The values of the fingers.
    bool turn; //Whose turn it is.
    bool ended = false; //Has the game ended.
    bool winner; //Who won the game.
    void init() //Check if the game has ended and find the available moves.
    {
        if (!(val[turn][0]) && !(val[turn][1])) {
            ended = true;
            winner = !turn;
            canmove.fill(0);
            return;
        }
        if (!(val[!turn][0]) && !(val[!turn][1])) {
            ended = true;
            winner = turn;
            canmove.fill(0);
            return;
        }
        if (!val[turn][0]) {
            canmove[0] = 0;
            canmove[1] = 0;
            canmove[2] = 0;
            if (val[turn][1] % 2)
                canmove[5] = 0;
        }
        if (!val[turn][1]) {
            if (val[turn][0] % 2)
                canmove[2] = 0;
            canmove[3] = 0;
            canmove[4] = 0;
            canmove[5] = 0;
        }
        if (!val[!turn][0]) {
            canmove[0] = 0;
            canmove[3] = 0;
        }
        if (!val[!turn][1]) {
            canmove[1] = 0;
            canmove[4] = 0;
        }
    }
    int playmove(int mov) //Play a move to get the next game state.
    {
        auto newval = val;
        switch (mov) {
        case 0:
            newval[!turn][0] = (newval[turn][0] + newval[!turn][0]);
            newval[!turn][0] = (5 > newval[!turn][0]) ? newval[!turn][0] : 0;
            break;
        case 1:
            newval[!turn][1] = (newval[turn][0] + newval[!turn][1]);
            newval[!turn][1] = (5 > newval[!turn][1]) ? newval[!turn][1] : 0;
            break;
        case 2:
            if (newval[turn][1]) {
                newval[turn][1] = (newval[turn][0] + newval[turn][1]);
                newval[turn][1] = (5 > newval[turn][1]) ? newval[turn][1] : 0;
            } else {
                newval[turn][0] /= 2;
                newval[turn][1] = newval[turn][0];
            }
            break;
        case 3:
            newval[!turn][0] = (newval[turn][1] + newval[!turn][0]);
            newval[!turn][0] = (5 > newval[!turn][0]) ? newval[!turn][0] : 0;
            break;
        case 4:
            newval[!turn][1] = (newval[turn][1] + newval[!turn][1]);
            newval[!turn][1] = (5 > newval[!turn][1]) ? newval[!turn][1] : 0;
            break;
        case 5:
            if (newval[turn][0]) {
                newval[turn][0] = (newval[turn][1] + newval[turn][0]);
                newval[turn][0] = (5 > newval[turn][0]) ? newval[turn][0] : 0;
            } else {
                newval[turn][1] /= 2;
                newval[turn][0] = newval[turn][1];
            }
            break;
        default:
            std::cout << "\nInvalid move!\n";
        }
        int ret = 0;
        for (int i = 1; i > -1; i--)
            for (int j = 1; j > -1; j--) {
                ret+=newval[i][j];
                ret*=5;
            }
        ret/=5;
        return ret;
    }
    static int minimax(gamestate *game, int depth, int alpha, int beta) //Minimax searching function with alpha beta pruning.
    {
        if (game->isover()) {
            if (game->won())
                return 1000 - depth;
            else
                return depth - 1000;
        }
        if (game->isturn()) {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]&&t[game->playmove(i)]!=-1) {
                    int score;
                    if(!t[game->playmove(i)]){
                        t[game->playmove(i)] = -1;
                        gamestate *play = new gamestate(game->playmove(i),!game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        t[game->playmove(i)] = score;
                    }
                    else
                        score = t[game->playmove(i)];
                    if (score > alpha) alpha = score;
                    if (alpha >= beta) break;
                }
            return alpha;
        } else {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]&&f[game->playmove(i)]!=-1) {
                    int score;
                    if(!f[game->playmove(i)]){
                        f[game->playmove(i)] = -1;
                        gamestate *play = new gamestate(game->playmove(i),!game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        f[game->playmove(i)] = score;
                    }
                    else
                        score = f[game->playmove(i)];
                    if (score < beta) beta = score;
                    if (alpha >= beta) break;
                }
            return beta;
        }
    }
};
int main(void)
{
    gamestate test(243, true);
    auto movelist = test.choosemoves();
    for(auto i: movelist)
        std::cout<<i<<std::endl;
    return 0;
}

I'm passing the moves in a sort of base-5 to decimal system as each hand can have values from 0 to 4.

In the code I have input the state -

3    3

4    1

The output says I should hit my right hand (1) to the opponent's right (3) but it does not say I should hit it to my opponent's left (also 3)

I think the problem is because of the way I handled infinite looping.

What would be the right way to do it? Or if that is the right way, then how do I fix the problem?

Also please let me know how I can improve my code.

Thanks a lot.

Edit:

I have changed my minimax function as follows to ensure that infinite loops are scored above losing but I'm still not getting symmetry. I also made a function to add depth to the score

static float minimax(gamestate *game, int depth, float alpha, float beta) //Minimax searching function with alpha beta pruning.
    {
        if (game->isover()) {
            if (game->won())
                return 1000 - std::atan(depth) * 2000 / std::acos(-1);
            else
                return std::atan(depth) * 2000 / std::acos(-1) - 1000;
        }
        if (game->isturn()) {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]) {
                    float score;
                    if(!t[game->playmove(i)]) {
                        t[game->playmove(i)] = -1001;
                        gamestate *play = new gamestate(game->playmove(i), !game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        t[game->playmove(i)] = score;
                    } else if(t[game->playmove(i)] == -1001)
                        score = 0;
                    else
                        score = adddepth(t[game->playmove(i)], depth);
                    if (score > alpha) alpha = score;
                    if (alpha >= beta) break;
                }
            return alpha;
        } else {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]) {
                    float score;
                    if(!f[game->playmove(i)]) {
                        f[game->playmove(i)] = -1001;
                        gamestate *play = new gamestate(game->playmove(i), !game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        f[game->playmove(i)] = score;
                    } else if(f[game->playmove(i)] == -1001)
                        score = 0;
                    else
                        score = adddepth(f[game->playmove(i)], depth);
                    if (score < beta) beta = score;
                    if (alpha >= beta) break;
                }
            return beta;
        }
    }

This is the function to add depth -

float adddepth(float score, int depth) //Add depth to pre-calculated score.
{
    int olddepth;
    float newscore;
    if(score > 0) {
        olddepth = std::tan((1000 - score) * std::acos(-1) / 2000);
        depth += olddepth;
        newscore = 1000 - std::atan(depth) * 2000 / std::acos(-1);
    } else {
        olddepth = std::tan((1000 + score) * std::acos(-1) / 2000);
        depth += olddepth;
        newscore = std::atan(depth) * 2000 / std::acos(-1) - 1000;
    }
    return newscore;
}

Upvotes: 0

Views: 1508

Answers (1)

Colin P. Hill
Colin P. Hill

Reputation: 422

Disclaimer: I don't know C++, and I frankly haven't bothered to read the game rules. I have now read the rules, and still stand by what I said...but I still don't know C++. Still, I can present some general knowledge of the algorithm which should set you off in the right direction.

Asymmetry is not in itself a bad thing. If two moves are exactly equivalent, it should choose one of them and not stand helpless like Buridan's ass. You should, in fact, be sure that any agent you write has some method of choosing arbitrarily between policies which it cannot distinguish.

You should think more carefully about the utility scheme implied by refusing to visit previous states. Pursuing an infinite loop is a valid policy, even if your current representation of it will crash the program; maybe the bug is the overflow, not the policy that caused it. If given the choice between losing the game, and refusing to let the game end, which do you want your agent to prefer?

Playing ad infinitum

If you want your agent to avoid losing at all costs -- that is, you want it to prefer indefinite play over loss -- then I would suggest treating any repeated state as a terminal state and assigning it a value somewhere between winning and losing. After all, in a sense it is terminal -- this is the loop the game will enter forever and ever and ever, and the definite result of it is that there is no winner. However, remember that if you are using simple minimax (one utility function, not two), then this implies that your opponent also regards eternal play as a middling result.

It may sound ridiculous, but maybe playing unto infinity is actually a reasonable policy. Remember that minimax assumes the worst case -- a perfectly rational foe whose interests are the exact opposite of yours. But if, for example, you're writing an agent to play against a human, then the human will either err logically, or will eventually decide they would rather end the game by losing -- so your agent will benefit from patiently staying in this Nash equilibrium loop!

Alright, let's end the game already

If you want your agent to prefer that the game end eventually, then I would suggest implementing a living penalty -- a modifier added to your utility which decreases as a function of time (be it asymptotic or without bound). Implemented carefully, this can guarantee that, eventually, any end is preferable to another turn. With this solution as well, you need to be careful about considering what preferences this implies for your opponent.

A third way

Another common solution is to depth-limit your search and implement an evaluation function. This takes the game state as its input and just spits out a utility value which is its best guess at the end result. Is this provably optimal? No, not unless your evaluation function is just completing the minimax, but it means your algorithm will finish within a reasonable time. By burying this rough estimate deep enough in the tree, you wind up with a pretty reasonable model. However, this produces an incomplete policy, which means that it is more useful for a replanning agent than for a standard planning agent. Minimax replanning is the usual approach for complex games (it is, if I'm not mistaken, the basic algorithm followed by Deep Blue), but since this is a very simple game you probably don't need to take this approach.

A side note on abstraction

Note that all of these solutions are conceptualized as either numeric changes to or estimations of the utility function. This is, in general, preferable to arbitrarily throwing away possible policies. After all, that's what your utility function is for -- any time you make a policy decision on the basis of anything except the numeric value of your utility, you are breaking your abstraction and making your code less robust.

Upvotes: 2

Related Questions