Reputation: 229
In many reinforcement learning (RL) papers, Markov Decision Process (MDP) is a typical problem setting for RL problem. What is the real benefit of this setting? Some papers use LSTM as their policy network structure which obviously violate the MDP assumption and make more sense.
Upvotes: 0
Views: 376
Reputation: 6689
Basically, Markov Decision Processes provide a theoretical framework that allows to analyze the convergence guarantees of the algorithms as well as other theoretical properties. Although LSTM and other deep learning approaches combined with RL have reached impressive results, they lack from a solid theoretical background that allow understand or ensure when the algorithm is going to learn something useful, or how far the learned policy will be from the optimal one.
Upvotes: 2