Alpha-beta pruning agent seems to choose worse moves as depth of the search grows

Question

I'm implementing an AI controller for the 2-players version of the board game Quoridor in Python and i'm using the Alpha-beta pruning variation of the Negamax algorithm. I expected that growing the depth of the search would result in a better strategy adopted by the agent but this doesn't happen in my implementation. By passing from depth 3 to 4 (bigger depths have now unacceptable computational times) the agent goes from playing sufficiently good to play really awful matches. I'm pretty sure (according to literature found and the logic of the game) that the heuristic i'm using is effective and so i can't figure out what the problem is. This is my alpha-beta pruning implementation

    #self._h_type: heuristic type used by the agent
    #self._env: board in a given status
    def alphaBeta(depth,alpha,beta):
        if self._env.isTerminal() or not depth:
            return (self._env.getHeuristic(self._h_type),None)
        else:
            max_value=(-math.inf,None)

            for a in self._env.getActions():
                self._env.update(a)
                x=alphaBeta(depth-1,(-beta[0],beta[1]),(alpha[0],alpha[1]))
                
                if -x[0]>max_value[0]:
                    max_value=(-x[0],a)

                elif -x[0]==max_value[0] and uniform(0,1)>0.5:
                    max_value=(-x[0],a)

                self._env.undo()
                if max_value[0]>alpha[0]:alpha=max_value
                if alpha[0]>=beta[0]: return alpha

            return max_value

    move=alphaBeta(self._depth,(-math.inf,None),(math.inf,None))
    self._env.update(move[1])

Heuristic function computes the difference between the length of the shortest paths to he goal row of the two players (between the current turn's player and the opponent) which is a objective measure of the advantage in this game

#shortest path heuristic.
#+/- 1000 for terminal state
#difference of shortest paths length to the goal rows
def getSpHeuristic(self):
    
    if self.isTerminal(): return self.getWmHeuristic()
    else:
        p1=self.getMovingPawn()
        p2=self.getOpponentPawn()
        
        len(self._graph.shortestPath(p2.getPosition(),p2.getGoalRow()))\             
        -len(self._graph.shortestPath(p1.getPosition(),p1.getGoalRow()))

Alpha-beta pruning agent seems to choose worse moves as depth of the search grows

Answers (1)

Related Questions