Reputation: 53876
As part of Q learning an objective is to maximize the expected utility. I know
Reading wikipedia : https://en.wikipedia.org/wiki/Q-learning describes expected utility in following contexts :
It works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter.
One of the strengths of Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment.
But does not define what utility is, what is meant by utility ?
When maximizing utility
what exactly is being maximized ?
Upvotes: 1
Views: 3707
Reputation: 6689
In general terms, utility means profitable or beneficial (as @Rob posted in his response).
In Q-learning context, utility is closed related (they can be viewed as synonyms) with action-value function, as you read in Wikipedia explanation. Here, the action-value function of policy π
is an estimation of the return (long term reward) that the agent is going to obtain if it performs the action a
in a given state s
and follows the policy π
. So, when you maximizes the utility, actually you are maximizing the rewards your agent will obtain. As rewards are defined to achieve a goal, you are maximizing the "quantity" of goal achieved.
Upvotes: 2
Reputation: 15168
In this case, "utility" means functionality or usefulness. So "maximum functionality" or "maximum usefulness".
Plugging the word into Google gives you:
the state of being useful, profitable, or beneficial.
Upvotes: 2