Reputation: 414
I have tried implementing an algorithm with both a dictionary comprehension and a for loop, both of which I believed were set to achieve the same results.
for i in range(num_iter):
new_state_values = {s: get_new_state_value(mdp, state_values, s, gamma) for s in mdp.get_all_states()}
for i in range(num_iter):
for s in mdp.get_all_states():
new_state_values[s] = get_new_state_value(mdp, state_values, s, gamma)
as my algorithm runs, these achieve very different results. Can someone point out to me where the difference lies in the two?
the full algorithm is below
# parameters
gamma = 0.9 # discount for MDP
num_iter = 100 # maximum iterations, excluding initialization
min_difference = 0.001 # stop VI if new values are this close to old values (or closer)
# initialize V(s)
state_values = {s: 0 for s in mdp.get_all_states()}
for i in range(num_iter):
new_state_values = {s: get_new_state_value(mdp, state_values, s, gamma) for s in mdp.get_all_states()}
# Compute difference
diff = max(abs(new_state_values[s] - state_values[s]) for s in mdp.get_all_states())
state_values = new_state_values
if diff < min_difference:
print("Terminated")
break
the 'for-loop' version runs for barely any iterations, while the dictionary-comprehension runs for many more iterations.
update: the above code works and converges (and I suppose is the most pythonic, imo). The accepted answer provides good insight on the different methods.
Upvotes: 1
Views: 89
Reputation: 155497
The non-comprehension version accumulates values, without discarding those from the previous runs of the outer loop. If you want it to be equivalent, you'd need to change:
for i in range(num_iter):
for s in mdp.get_all_states():
new_state_values[s] = get_new_state_value(mdp, state_values, s, gamma)
to:
for i in range(num_iter):
new_state_values = {} # NEW!!!
for s in mdp.get_all_states():
new_state_values[s] = get_new_state_value(mdp, state_values, s, gamma)
to reinitialize new_state_values
to a clean dict
.
In your full code, the non-comprehension solution would leave both state_values
and new_state_values
as aliases of the same dict
(so state_values
would be changing as you were using it), making the problem even worse; the dict
comprehension fixes it by building a new dict
without modifying state_values
as it's being built.
Upvotes: 3