Oksisiya
Oksisiya

Reputation: 1

How can I formalize reinforcement learning evaluation metrics?

Good morning.

I am analyzing reinforcement learning results in TensorBoard. Is it appropriate to express the two metrics below with the following formula?

Thank you.

Upvotes: 0

Views: 29

Answers (1)

Yogesh Tripathi
Yogesh Tripathi

Reputation: 11

  1. Your first metric is absolutely valid and in fact, it is often used in online reinforcement learning after certain number of training episodes (an epoch), and the metric is computed over all the episodes that occured during that epoch. It is usually visualized across epochs to get an idea of overall learning curve and sample efficiency.

  2. The second metric seems problematic. If the index i represents the index of an episode, then V(s_i) is ill-defined, because the index of a state is supposed to be a step within an episode and not an episode itself. Assuming that you meant \frac{1}{N} \sum_{i=1}^{N} \sum_{t=0}^{T_i} (V(s_t) - G_t)^2 (Note that in this case, V(s_t) is the value function predicted for state s_t, whereas G_t is the actual discounted return at time t (because that is what V(s_t) is supposed to approximate)), this metric is usually called Value error, and you can use it to get an idea of how good your value function approximator is.

Upvotes: 0

Related Questions