Sequence with the max score?

Question

let say I have n-states S={s1,s2,s3, ..... sn } and I have a score for every transition i.e. T-matrix f.e. s1->s5 = 0.3, s4->s3 = 0.7, ....etc.

What algorithm or procedure should I use to select the best scored sequence/path starting from state-x (s_x).

Two questions :

Pick best next State, so that in infinitely long path I pick as best as possible state on average ?
Given path-length L , pick the sequence of states that will generate the highest score ?

I'm currently researching Reinforcement learning, but it seems like overkill, because I have neither Actions, nor Policies. May be I can use something like a Value function, dunno.

What would you use ?

PS>In some of the scenario T-matrix may change over time.

http://mnemstudio.org/path-finding-q-learning-tutorial.htm

It seems that Q-learning is a good bet. The only difference I see is that if I'm to store Q-values over time I have to figure way to accommodate for changing T-matrix.

And the second harder one is that there is no final goal, but only changing intermediary scores. May be I don't need to change the algorithm it will simply converge towards changing scores, which is OK I think.

My initial thoughts were on every time-step to do L-steps best path (i.e. recalculate Q every time from scratch), but if I can I will prefer to keep a changing Q table according to incoming data.

Sequence with the max score?

Answers (1)

Related Questions