Which reinforcement learning algorithm is applicable to a problem with a continuously variable reward and no intermediate rewards?

Question

I think the title says it. A "game" takes a number of moves to complete, at which point a total score is computed. The goal is to maximize this score, and there are no rewards provided for specific moves during the game. Is there an existing algorithm that is geared toward this type of problem?

EDIT: By "continuously variable" reward, I mean it is a floating point number, not a win/loss binary. So you can't, for example, respond to "winning" by reinforcing the moves made to get there. All you have is a number. You can rank different runs in order of preference, but a single result is not especially meaningful.

Which reinforcement learning algorithm is applicable to a problem with a continuously variable reward and no intermediate rewards?

Answers (1)

Related Questions