d3ward
d3ward

Reputation: 31

Is it possible to run a Minimax search with Alpha-Beta Pruning in parallel with OpenMP?

With a basic Minimax search, it seems easy to use an OMP For to split up the work between multiple threads. For instance -

#pragma omp parallel for
for (each child node)
{
    val = minimax(child, depth - 1, FALSE);
    bestValue = max(bestValue, val);
}

However, it seems like this will be impossible with Alpha-Beta pruning, at least with my understanding.

#pragma omp parallel for
for (each child node)
{
   α = max(α, alphabeta(child, depth - 1, α, β, FALSE));
   if (β ≤ α)
       break;
}

In OpenMP, it is a requirement that a For loop may only have a single entry/exit point if the loop is to be made parallel. However, Alpha-Beta pruning breaks this rule, as it is possible to break out of the loop whenever pruning needs to be accomplished (in the pseudo code above, this will happen when β is less than or equal to α).

So my question is, is there some way to work around this limitation of OpenMP? I would like to run my Alpha-Beta search in parallel using OpenMP, but this limitation has me stumped at the moment.

Upvotes: 3

Views: 4239

Answers (3)

Jason Rodwel
Jason Rodwel

Reputation: 1

So I'm trying something similar to this, I first get all legal moves in an outer method, my miniMax algorithm take a single position as a starting point. I'm writing in java btw, for the synchronization I'm using a locking object, each thread uses an atomic Boolean to signal when it's done. Add all threads to a collection, using an outer while loop that checks if the collection is empty, and an inner for loop which will iterate over a copy of the list checking the Booleans to see if any are finished. If any are get the value from the thread and if it's bigger than either starting value of 0 or the last value retrieved, set an output node to be the highest. After the for loop inside the while call wait. Also when a thread finishes notify the lock. So itl iterate through once checking the Booleans, none will be ready, the while loop waits, when a thread finishes the main thread will be notified an step through list to get the value and remove the thread, when all threads are finished you have your move. Though you will need to run the miniMax on a copy of your board or a cut down version otherwise youl will get concurrency errors of threads trying out different moves on the same board. Hope this made sense. Mine uses alpha beta pruning and I can search to a much deeper depth because they are executing in parallel.

Upvotes: 0

TheSlater
TheSlater

Reputation: 660

As a first idea you can calculate the root moves in parallel (for example with #pragma omp parallel for schedule(dynamic, 1), so that every thread gets its own root move and after that, like a task pool, choose the next that no thread has touched. Because you have to calculate every root move, there is no problem with a break where your threads have to leave the loop. If a thread is ready with it's root move you can update the "best value" and use it for the next root moves. It has to be a shared value and you have to guard it with an #pragma omp critical for example.

If you have only 2-4 working threads that's an satisfying approach. But it has disadvantages when there are only a few root-moves in a position or you have a very good sorting function for the moves. If the best move is calculate at first, in the sequential case the other moves will be calculate very fast because of the cut-offs. So it is a waste of calculate power, if the other threads search parallel other root moves before the value of the probably "best move" is known. It is possible to ordering the moves so efficient, that in over 99% of the cases the best move will be calculate at fist. This works with helpers called "history table", "killer moves" and "null move prunging".

As a state of the art parallelization you can use the Young Brothers Wait Concept (YBWC). There, like in the Principal Variation Splitting (PVS), in every node the first move is calculate sequential (!) and only if there is no cut-off the alternative moves (brothers) will be calculate in parallel. The brothers have the already calculated value of the first node to make fast cut-offs and only try to beat it. Hardly every open source engine in the top 10 use its own variation of that, but its not easy to implement, because you can spawn tasks in every node in different depths so its not easy to formulate with the standard OpenMP constructs. The new OpenMP 3.1 tasks-construct can help, but I'm not sure about that. As an first place to go you can visit https://www.chessprogramming.org/Parallel_Search to read in the topic.

Upvotes: 4

Raul
Raul

Reputation: 403

With the task construct you can traverse irregular structures. A tree, what is formed during a search, is an example on what to apply tasks on.

A typical example is fibonacci or solving a sudoku:

#include <iostream>
#include <omp.h>
#include <cstdlib>
using namespace std;

unsigned long fib(unsigned long n) {
    unsigned long a = 0;
    unsigned long b = 0;

    if (n == 0) return n;
    if (n == 1) return n;
    #pragma omp task shared(a)
    a = fib(n-1);

    #pragma omp task shared(b)
    b = fib(n-2);

    #pragma omp taskwait
    return a+b;
}


int main(int argc, char** argv) {
    unsigned long result = 0;
    unsigned long N = atol(argv[1]);
    if (N < 0) {
        cerr << "N is a negative number" << endl;
        return -1;
    }
    double start = omp_get_wtime();
    #pragma omp parallel
    {
        #pragma omp single
        result = fib(N);
    }
    double elapsed = omp_get_wtime() - start;
    cout << result << endl;
    cout << "Elapsed time: " << elapsed << endl;

    return 0;
}

This example code is too simple for the overheads the tasks add, but you can test with small numbers (fib(30) or fib(35) lasts 25 seconds with 2 threads on my machine) and see how recursivity works. I think that adding to that what the previous poster said, you can explore that path.

On the other hand, the alphabeta is a serial algorithm as per the other poster, so it would be quite tricky to get it to work in parallel without major changes.

Upvotes: 0

Related Questions