Minimax with a-b prunning and transposition table

I'm trying to implement a minimax algorithm with alpha-beta prunning AND transposition table. This is for a pacman agent that may cycle, so special care must be taken about this. If a state (state of the game and turn (pacman or ghost)) is in the transposition table and the previous to be seen is a parent (grand-parent, ...) of the node, it can be discarded. This works for minimax without a-b prunning. From previous search, tt (transposition table) with a-b seems to be much much harder to implement. I'm trying to keep the code as clear as possible, it is based on this pseudo-code Artificial Intelligence: A Modern Approach. I would like to keep the final result as close as possible with this first approach.

Each pseudo-code I found was defined in a very diffrent way :

First pseudo-code ; Second pseudo-code ; Third pseudo-code

Most of the differences seem cosmetic. But none of those codes has exactly the structure I'm looking for : a minimax divided with a minValue and a maxValue with a-b prunning

Thanks in advance,

Please ask for any futher explanation

Upvotes: 3

Answers (1)

Spencer Evans

Reputation: 489

I'm still kind of new to more advanced AI optimization, but I'll share what I've learned. Two of the pseudo-code links (1 and 3) are both Negamax, which is trickier than minimax because it is less intuitive. The two differing implementations of Negamax in 1 and 3 require different evaluation functions and that is the main reason for their differences (more below). The second link you posted is for MTD(f) which I haven't implemented before, but I believe is different still from both Minimax and Negamax. I believe MTD(f) is considered to be faster. Lastly, the only resource I have ever seen for Minimax with transposition tables is here and I am really not sure if it is even correct. Negamax is pretty much the standard and if you can use Minimax, you can probably use Negamax instead.

While Negamax and Minimax look different, they are essentially doing the same thing. This blog post gives a pretty good description of how they're related, but doesn't explain the differences. I will try explain why they're different below.

Why minimax and negamax look different but are essentially the same becomes a little more apparent after thinking about a few things related to Minimax:

Minimax only works for 2 player games with one player being the maximizer and another being the minimizer. Tic Tac Toe is a simple example.
A typical evaluation function for Minimax will return +100 if X won in a terminal state, will return -100 if O won in a terminal state, and will return 0 for a draw.
Notice how the scores are the inverse of one another. Every point gained by player 1 is a point lost for player 2. It's a zero sum game.

And a few points about Negamax:

Negamax also only works for 2 player zero sum games. Every point for player 1 is a point lost for player 2.
Negamax uses a slightly different evaluation function than Minimax. It requires that the evaluation is always done from the current player's point of view. That is to say that if in a terminal state X won and it is X's turn, the evaluation should be +100. If it is in a terminal state where X won but it's O's turn, the evaluation would be -100. This is different than what Minimax expects (Minimax always wants and X win to be worth +100). Pseudo-code 1 expects this type of evaluation function.
Some Negamax pseudo-code's, like the wikipedia article in 3, try to use the same evaluation function as Minimax by negating the evaluation function value using color in this line "return color × the heuristic value of node". This also works, but I've never done it that way (links to how I do it below). Note that the color value will only be -1 for min players. I find this way to be more confusing all around.
Now that the evaluation function is described... notice this line "value := max(value, −negamax(child, depth − 1, −β, −α, −color))" in pseudo-code 3. Notice that the returned value (some evaluation value), which is always from the current player's point of view is inverted. That's because turns alternate and the eval came from a child state, the other player's turn. The alpha and beta values are also inverted.

With Minimax we're coming up with positive and negative evaluations. With Negamax we're always creating positive evaluations and then inverting them as necessary, hence Nega. This is possible because the game is zero sum, a point for player 1 is a loss of a point for player 2.

Why use Negamax? Because its simpler. It is more challenging to implement the first time, but makes things more concise. I also believe that transposition table entries need to be handled differently (more complex) for Minimax than for Negamax. Most importantly, everyone else uses it. I wish I had a better explanation for why.

Here is the best resource I have found for implementing Transposition Tables with Negamax (most pseudo-code isn't all that helpful):

Iterative Deepening NegaScout with alpha beta pruning and transposition tables
I've also implemented vanilla Negamax with transposition tables, but I can no longer find the resources I used. To convert the above to vanilla Negamax you simply replace lines 504 (beginning with // null window search) down to 521 with "goodness = -minimax(state, depth - 1, -beta, -alpha);" The extra lines in that code block are the "scout" part which starts with a narrow search alphaBeta window and widens it as needed. Generally NegaScout is better than NegaMax. I could share my full source, but I'd need some time to prepare something that is fit for posting to SO.

If for some reason you cannot implement Negamax, this is the only resource I have found for implementing Transposition Tables with Minimax.

Lastly, I want to throw out a couple things:

When using transposition tables you will probably want to use Iterative Deepening as it provides a natural cutoff when time is your constraint
When using transposition tables you will want to consider isomorphic boards. That is you will want to consider the same board in reflecting positions. Example: Evaluating this board in tic tac toe XOX|---|X-- is the same as evaluating X--|---|XOX (vertical flip). Not sure if this will apply to Pacman but it is a huge improvement when available. In Tic Tac Toe it leads to 70-90% of search states being shaved off w/ transposition tables. Reply in a comment if you want to discuss.
If you're implementing your game in JavaScript, take note that standard Zobrist keys won't work because JS binary operators operate on 32 bits instead of 64. There are a few different ways to do it, but I suggest starting with just using strings as a key in a {} object.
If you're searching for a multiplayer AI you should be looking at Hypermax / Max-N. Minimax and Negamax fail beyond 2 players.

Upvotes: 6

Minimax with a-b prunning and transposition table

Answers (1)

Related Questions