Reputation: 1
IndexError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_10800\268253893.py in <module>
15 next_state, reward, done,trauncated,info = env.step(action)
16 #if state == int:
---> 17 q[state,action] = q[state,action] + LEARNING_RATE*(reward + GAMMA*np.max(q[next_state,:]) - q[state,action])
18 state = next_state
19
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
I know that the first state variable is an tuple but if i ignore that tuple and start with a int the learning dosent seem to work....i saw freecodecamp tutorial of tensorflow2.0 and took the code from there....it seemed to work for the mentor who was teaching
rewards = []
for episode in range(EPISODES):
state = env.reset()
for _ in range(MAX_STEPS):
if RENDER:
env.render()
if np.random.uniform(0,1) < epsilon:
action = env.action_space.sample()
else:
#if state == int:
action = np.argmax(q[state,:])
next_state, reward, done,trauncated,info = env.step(action)
#if state == int:
q[state,action] = q[state,action] + LEARNING_RATE*(reward + GAMMA*np.max(q[next_state,:]) - q[state,action])
state = next_state
if done:
rewards.append(reward)
epsilon -= 0.001
break
print(q)
print("Score over time: " + str(sum(rewards)/EPISODES))
Upvotes: 0
Views: 209
Reputation: 1083
This problem occurs because of env.reset()
. This function returns 2 variables: observation
(or in your case, state
) and info
. You will want to retrieve the variables from env.reset()
such that:
state, info = env.reset()
Upvotes: 1