Reputation: 21
Why does the position and newposition give the same output and update together in the next loop?
for game in range(nr_of_games):
# Initialize the player at the start position and store the current position in position
position=np.array([0,19])
status = -1
# loop over steps taken by the player
while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)
# Find out what move to make using
q_in=Q[position[0],position[1]]
move, action = action_fcn(q_in,epsilon,wind)
# update location, check grid,reward_list, and status_list
newposition[0] = position[0] + move[0]
newposition[1] = position[1] + move[1]
print('new loop')
print(newposition)
print(position)
grid_state = grid[newposition[0]][newposition[1]]
reward = reward_list[grid_state]
status = status_list[grid_state]
status = int(status)
if status == 1:
Q[position[0],position[1],action]= reward
break #Game over
else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
position = newposition
print out:
new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]
Upvotes: -2
Views: 164
Reputation: 626
that is because you trying to copy one list to another list with =
operator; used with lists it assigns the pointer stored in right variable to the left variable, so physically the point to the same memory cells.
To copy a list truly, use the list.copy()
method.
Upvotes: 1
Reputation: 6665
Apparently, somewhere you do not show us, you do
>>> newposition = position
So actually, when you increment newposition
, you actually are doing it over position
as well.
So just make newposition
be something different than position
. I mean, make them have id(newposition) != id(position)
and you will be good. Because currently, I guess that these two id
s are the same, aren't they ?
Why does the position and newposition give the same output and update together in the next loop?
Because they are the same object. I am not (only) saying that they are equal, I am saying that newposition
is position
, i.e. you currently have (newposition is position) is True
.
Just define newposition
independently from position
. For example:
# [...]
for game in range(nr_of_games):
# Initialize the player at the start position and store the current position in position
position = np.array([0,19])
newposition = np.empty((2,))
# [...]
Also, you may have good reasons to do so, but keep in mind that if move
and position
have the same shape and convey the "same information", you could also just do
# [...]
# [...]
# [...]
# newposition[0] = position[0] + move[0]
# newposition[1] = position[1] + move[1]
newposition = position + move
# [...]
and remove newposition = np.empty((2,))
.
Upvotes: 1