iacolippo
iacolippo

Reputation: 4513

Towers of Hanoi - Bellman equation solution

I have to implement an algorithm that solves the Towers of Hanoi game for k pods and d rings in a limited number of moves (let's say 4 pods, 10 rings, 50 moves for example) using Bellman dynamic programming equation (if the problem is solvable of course).

Now, I understand the logic behind the equation:

Bellman equation - discrete time

where V^T is the objective function at time T, a^0 is the action at time 0, x^0 is the starting configuration, H_0 is cumulative gain f(x^0, a^0)=x^1.

The cardinality of the state space is $k^d$ and I get that a good representation for a state is a number in base k: d digits that can go from 0 to k-1. Each digit represents a ring and the digit can go from 0 to k-1, that are the labels of the k rings.

I want to minimize the number of moves for going from the initial configuration (10 rings on the first pod) to the end one (10 rings on the last pod).

What I don't get is: how do I write my objective function?

Upvotes: 1

Views: 417

Answers (2)

iacolippo
iacolippo

Reputation: 4513

What actually was requested was this:

def k_hanoi(npods,nrings):
    if  nrings == 1 and npods > 1: #one remaining ring: just one move 
        return 1
    if npods == 3:
        return 2**nrings - 1 #optimal solution with 3 pods take 2^d -1 moves
    if npods > 3 and nrings > 0:
        sol = []
        for pivot in xrange(1, nrings): #loop on all possible pivots
            sol.append(2*k_hanoi(npods, pivot)+k_hanoi(npods-1, nrings-pivot))
        return min(sol) #minimization on pivot


k = 4
d = 10

print k_hanoi(k, d)

I think it is the Frame algorithm, with optimization on the pivot chosen to divide the disks in two subgroups. I also think someone demonstrated this is optimal for 4 pegs (in 2014 or something like that? Not sure btw) and conjectured to be optimal for more than 4 pegs. The limitation on the number of moves can be implemented easily.

The value function in this case was the number of steps needed to go from the initial configuration to the ending one and it needed be minimized. Thank you all for the contribution.

Upvotes: 0

ysdx
ysdx

Reputation: 9335

The first you need to do is choose a reward function H_t(s,a) which will define you goal. Once this function is chosen, the (optimal) value function is defined and all you have to do is compute it.

The idea of dynamic programming for the Bellman equation is that you should compute V_t(s) bottom-up: you start with t=T, then t=T-1 and so on until t=0.

The initial case is simply given by:

V_T(s) = 0, ∀s

You can compute V_{T-1}(x) ∀x from V_T:

V_{T-1}(x) = max_a [ H_{T-1}(x,a) ]

Then you can compute V_{T-2}(x) ∀s from V_{T-1}:

V_{T-2}(x) = max_a [ H_{T-2}(x,a) + V_{T-1}(f(x,a)) ]

And you keep on computing V_{t-1}(x) ∀s from V_{t}:

V_{t-1}(x) = max_a [ H_{t-1}(x,a) + V_{t}(f(x,a)) ]

until you reach V_0.

Which gives the algorithm:

forall x:
  V[T](x) ← 0
for t from T-1 to 0:
  forall x:
    V[t](x) ← max_a { H[t](x,a) + V[t-1](f(x,a)) }

Upvotes: 1

Related Questions