Reputation: 83
i have just started to study Q-learning and see the possibilities of using Q-learning to solve my problem.
Problem: I am supposed to detect a certain combination of data, i have four matrices that acts as an input to my system, i have already categorised the inputs ( each input can either be Low (L) , or High (H) ). I need to detect certain types of input for example LLLH, LLHH, HHHH etc
NOTE: 1)LLLH means the first input in L, second input is L, third input is L and the fourth input is H! 2)I have labelled each type of input type as state, for example LLLL is state 1, LLLH is state 2, so on.
What i have studied in Q-learning is that most of the time you have one goal (only one state as a goal) which makes it easier for the agent to learn and create the Q-matrix from the R-matrix . Now in my problem i have many goal ( many states act as goal and need to be detected). I dont know how to design the states, how to create the Reward-matrix by having many goals and how the agent will learn. Can you please help me how can i use Q-learning in this kind of situation. Taking into account i have like 16 goals in 20+ states!
as i have mentioned above, i know what is q-learning, how the states and the goal works, the calculation of Q_matrix (how it learns).... but the problem is now i have many goals, i dont really know how to relate my problem to q-learning.. how many states do i need, and how to label the Rewards as i have many goals.
I need help on at least how can i create reward matrix with many goals
Upvotes: 2
Views: 2510
Reputation: 555
Multiple goals are being investigated as it does solve some critical RL problems.
Here is a great article where the goal is to deliver packages or recharge the battery... If you don't recharge the deliveries will fail, but if you constantly charge, you will not make any deliveries. It is a balance between these two important goals.
The author talk you through the logic and approach in TensorFlow: https://www.oreilly.com/ideas/reinforcement-learning-for-complex-goals-using-tensorflow
Upvotes: 0
Reputation: 6434
I need help on at least how can i create reward matrix with many goals
The simplest way is to make a reward for each goal and then make a weighted sum out of those rewards to make a total reward.
Rtot = w1 * R1 + w2 * R2 + ... + wn * Rn
you can decide then how to weigh each reward and it affects the final behavior of the agent because each time it tries to learn something different.
There are more complicated way that is called "Multi-dimensional Reward RL" or "Multi-criteria RL". You can google them and find related papers.
Upvotes: 0