Hanzy
Hanzy

Reputation: 414

How does this dictionary comprehension differ from this "for" loop?

I have tried implementing an algorithm with both a dictionary comprehension and a for loop, both of which I believed were set to achieve the same results.

dictionary comprehension

for i in range(num_iter):
    new_state_values = {s: get_new_state_value(mdp, state_values, s, gamma) for s in mdp.get_all_states()}

for loop

for i in range(num_iter):
    for s in mdp.get_all_states():
        new_state_values[s] = get_new_state_value(mdp, state_values, s, gamma)

as my algorithm runs, these achieve very different results. Can someone point out to me where the difference lies in the two?

Details

the full algorithm is below

# parameters
gamma = 0.9  # discount for MDP
num_iter = 100  # maximum iterations, excluding initialization
min_difference = 0.001  # stop VI if new values are this close to old values (or closer)

# initialize V(s)
state_values = {s: 0 for s in mdp.get_all_states()}

for i in range(num_iter):
    new_state_values = {s: get_new_state_value(mdp, state_values, s, gamma) for s in mdp.get_all_states()}

    # Compute difference
    diff = max(abs(new_state_values[s] - state_values[s]) for s in mdp.get_all_states())
    state_values = new_state_values

    if diff < min_difference:
        print("Terminated")
        break

the 'for-loop' version runs for barely any iterations, while the dictionary-comprehension runs for many more iterations.

update: the above code works and converges (and I suppose is the most pythonic, imo). The accepted answer provides good insight on the different methods.

Upvotes: 1

Views: 89

Answers (1)

ShadowRanger
ShadowRanger

Reputation: 155497

The non-comprehension version accumulates values, without discarding those from the previous runs of the outer loop. If you want it to be equivalent, you'd need to change:

for i in range(num_iter):
    for s in mdp.get_all_states():
        new_state_values[s] = get_new_state_value(mdp, state_values, s, gamma)

to:

for i in range(num_iter):
    new_state_values = {}  # NEW!!!
    for s in mdp.get_all_states():
        new_state_values[s] = get_new_state_value(mdp, state_values, s, gamma)

to reinitialize new_state_values to a clean dict.

In your full code, the non-comprehension solution would leave both state_values and new_state_values as aliases of the same dict (so state_values would be changing as you were using it), making the problem even worse; the dict comprehension fixes it by building a new dict without modifying state_values as it's being built.

Upvotes: 3

Related Questions