GAUSS
GAUSS

Reputation: 21

Variable updating wrong in loop - Python (Q-learning)

Why does the position and newposition give the same output and update together in the next loop?

for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position=np.array([0,19])

    status = -1
    # loop over steps taken by the player
    while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)

        # Find out what move to make using  
        q_in=Q[position[0],position[1]]

        
        move, action = action_fcn(q_in,epsilon,wind)
        
        # update location, check grid,reward_list, and status_list 
        
        newposition[0] = position[0] + move[0]
        newposition[1] = position[1] + move[1]
        
        print('new loop')
        print(newposition)
        print(position)
        
        
        grid_state = grid[newposition[0]][newposition[1]]
        reward = reward_list[grid_state]
        
        status = status_list[grid_state]
        status = int(status)
        
        if status == 1:
            Q[position[0],position[1],action]= reward
            break #Game over 
            
        else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
           
        position = newposition

print out:

new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]

Upvotes: -2

Views: 164

Answers (2)

Rustem Zakiev
Rustem Zakiev

Reputation: 626

that is because you trying to copy one list to another list with = operator; used with lists it assigns the pointer stored in right variable to the left variable, so physically the point to the same memory cells.

To copy a list truly, use the list.copy() method.

Upvotes: 1

keepAlive
keepAlive

Reputation: 6665

Apparently, somewhere you do not show us, you do

>>> newposition = position

So actually, when you increment newposition, you actually are doing it over position as well.

So just make newposition be something different than position. I mean, make them have id(newposition) != id(position) and you will be good. Because currently, I guess that these two ids are the same, aren't they ?

Why does the position and newposition give the same output and update together in the next loop?

Because they are the same object. I am not (only) saying that they are equal, I am saying that newposition is position, i.e. you currently have (newposition is position) is True.

Just define newposition independently from position. For example:

# [...]
for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position    = np.array([0,19])
    newposition = np.empty((2,))
    # [...]

Also, you may have good reasons to do so, but keep in mind that if move and position have the same shape and convey the "same information", you could also just do

# [...]
    # [...]
        # [...]
        # newposition[0] = position[0] + move[0]
        # newposition[1] = position[1] + move[1]
        newposition = position + move
        # [...]

and remove newposition = np.empty((2,)).

Upvotes: 1

Related Questions