Arjun Prakash
Arjun Prakash

Reputation: 1

only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices while using Q table

I am getting this error while using Q learning method with openai gym

IndexError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_10800\268253893.py in <module>
     15         next_state, reward, done,trauncated,info = env.step(action)
     16         #if state == int:
---> 17         q[state,action] = q[state,action] + LEARNING_RATE*(reward + GAMMA*np.max(q[next_state,:]) - q[state,action])
     18         state = next_state
     19 

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

this is what i tried

I know that the first state variable is an tuple but if i ignore that tuple and start with a int the learning dosent seem to work....i saw freecodecamp tutorial of tensorflow2.0 and took the code from there....it seemed to work for the mentor who was teaching

rewards = []
for episode in range(EPISODES):
    state = env.reset()
    for _ in range(MAX_STEPS):
        if RENDER:
            env.render()
        if np.random.uniform(0,1) < epsilon:
            action = env.action_space.sample()
        else:
            #if state == int:
            action = np.argmax(q[state,:])
        next_state, reward, done,trauncated,info = env.step(action)
        #if state == int:
        q[state,action] = q[state,action] + LEARNING_RATE*(reward + GAMMA*np.max(q[next_state,:]) - q[state,action])
        state = next_state

        if done:
            rewards.append(reward)
            epsilon -= 0.001
            break
print(q)
print("Score over time: " +  str(sum(rewards)/EPISODES))

Upvotes: 0

Views: 209

Answers (1)

Lexpj
Lexpj

Reputation: 1083

This problem occurs because of env.reset(). This function returns 2 variables: observation (or in your case, state) and info. You will want to retrieve the variables from env.reset() such that:

state, info = env.reset()

Upvotes: 1

Related Questions