How does this dictionary comprehension differ from this "for" loop?

Question

I have tried implementing an algorithm with both a dictionary comprehension and a for loop, both of which I believed were set to achieve the same results.

dictionary comprehension

for i in range(num_iter):
    new_state_values = {s: get_new_state_value(mdp, state_values, s, gamma) for s in mdp.get_all_states()}

for loop

for i in range(num_iter):
    for s in mdp.get_all_states():
        new_state_values[s] = get_new_state_value(mdp, state_values, s, gamma)

as my algorithm runs, these achieve very different results. Can someone point out to me where the difference lies in the two?

Details

the full algorithm is below

# parameters
gamma = 0.9  # discount for MDP
num_iter = 100  # maximum iterations, excluding initialization
min_difference = 0.001  # stop VI if new values are this close to old values (or closer)

# initialize V(s)
state_values = {s: 0 for s in mdp.get_all_states()}

for i in range(num_iter):
    new_state_values = {s: get_new_state_value(mdp, state_values, s, gamma) for s in mdp.get_all_states()}

    # Compute difference
    diff = max(abs(new_state_values[s] - state_values[s]) for s in mdp.get_all_states())
    state_values = new_state_values

    if diff < min_difference:
        print("Terminated")
        break

the 'for-loop' version runs for barely any iterations, while the dictionary-comprehension runs for many more iterations.

update: the above code works and converges (and I suppose is the most pythonic, imo). The accepted answer provides good insight on the different methods.

ShadowRanger · Accepted Answer

The non-comprehension version accumulates values, without discarding those from the previous runs of the outer loop. If you want it to be equivalent, you'd need to change:

for i in range(num_iter):
    for s in mdp.get_all_states():
        new_state_values[s] = get_new_state_value(mdp, state_values, s, gamma)

to:

for i in range(num_iter):
    new_state_values = {}  # NEW!!!
    for s in mdp.get_all_states():
        new_state_values[s] = get_new_state_value(mdp, state_values, s, gamma)

to reinitialize new_state_values to a clean dict.

In your full code, the non-comprehension solution would leave both state_values and new_state_values as aliases of the same dict (so state_values would be changing as you were using it), making the problem even worse; the dict comprehension fixes it by building a new dict without modifying state_values as it's being built.

How does this dictionary comprehension differ from this "for" loop?

dictionary comprehension

for loop

Details

Answers (1)

Related Questions

How does this dictionary comprehension differ from this &quot;for&quot; loop?

dictionary comprehension

for loop

Details

Answers (1)

Related Questions

How does this dictionary comprehension differ from this "for" loop?