sten
sten

Reputation: 7476

Sequence with the max score?

let say I have n-states S={s1,s2,s3, ..... sn } and I have a score for every transition i.e. T-matrix f.e. s1->s5 = 0.3, s4->s3 = 0.7, ....etc.

What algorithm or procedure should I use to select the best scored sequence/path starting from state-x (s_x).

Two questions :

  1. Pick best next State, so that in infinitely long path I pick as best as possible state on average ?
  2. Given path-length L , pick the sequence of states that will generate the highest score ?

I'm currently researching Reinforcement learning, but it seems like overkill, because I have neither Actions, nor Policies. May be I can use something like a Value function, dunno.

What would you use ?

PS>In some of the scenario T-matrix may change over time.


http://mnemstudio.org/path-finding-q-learning-tutorial.htm

It seems that Q-learning is a good bet. The only difference I see is that if I'm to store Q-values over time I have to figure way to accommodate for changing T-matrix.

And the second harder one is that there is no final goal, but only changing intermediary scores. May be I don't need to change the algorithm it will simply converge towards changing scores, which is OK I think.

My initial thoughts were on every time-step to do L-steps best path (i.e. recalculate Q every time from scratch), but if I can I will prefer to keep a changing Q table according to incoming data.

Upvotes: 1

Views: 145

Answers (1)

cadolphs
cadolphs

Reputation: 9617

Your option 1 would be called a greedy approach. That generally refers to approaches that pick the immediately "best" options. The problem with that is that greedy choices now can limit your optimal choices in the future.

If you don't set a limit for the path length, then obviously the max score is infinite.

So now the question is: What's the best sequence for a given path length? This can be solved in polynomial time with things like dynamic programming.

A recursive formulation (that you can then use to figure out the dynamic programming part) would say: To figure out the best path of length L starting from state x, look at all the other states y. For each of these, compute T_xy + "best path of length L-1 starting from state y".

Obviously the best path of length 1, starting from some state x, will be the "next best state", so your recursion has a simple base case.

Upvotes: 2

Related Questions