Tensorflow cannot find valid device for node. even after casting to float32

Question

Hello I am getting an error when I am trying to run my model

I am using tf2.1 and I have made a class for my model due to a few reasons
My model has two output layers called advantage and value and this is because I am making a duelling deep q networks.

Here is my __init__ method -

class model(Model):
    def __init__(self):
        super(model, self).__init__()
        self.lr = 0.01
        self.conv1 = Conv2D(filters=32, input_shape=(210, 160, 1), kernel_size=(3, 3), strides=1, padding='same', activation='elu')#(self.inp)

        self.conv2 = Conv2D(filters=32, kernel_size=(3, 3), strides=1, padding='same', activation='elu')#(self.conv1)
        self.mp2 = MaxPool2D(pool_size=(3, 3), strides=1, padding='same')#(self.conv2)

        self.conv3 = Conv2D(filters=64, kernel_size=(3, 3), strides=1, padding='same', activation='elu')#(self.mp2)
        self.mp3 = MaxPool2D(pool_size=(3, 3), strides=1, padding='same')#(self.conv3)

        self.conv4 = Conv2D(filters=64, kernel_size=(3, 3), strides=1, padding='same', activation='elu')#(self.mp3)
        self.mp4 = MaxPool2D(pool_size=(3, 3), strides=1, padding='same')#(self.conv4)

        self.flat = Flatten() #(self.mp6)
        self.value = Dense(1, activation=None)#(self.flat) # how good is a particular state
        self.advantage = Dense(env.action_space.n, activation=None)#(self.flat) # which is best action
        self.compile(optimizer=Adam(lr=self.lr), loss='mse', metrics=['accuracy'])

Then I have a function which is called predict_advantage where I am getting an error -

def predict_advantage(self, state):
        state = tf.cast(cv2.cvtColor(state, cv2.COLOR_RGB2GRAY), tf.float32)
        #x = self.inp(state)
        x = self.conv1(x)

        x=self.conv2(x)
        x=self.mp2(x)

        x=self.conv3(x)
        x=self.mp3(x)

        x=self.conv4(x)
        x=self.mp4(x)

        x = self.flat(x)
        # value = self.value(x)
        x = self.advantage(x)
        return x

As you see I am using tf.cast to cast to make the dtype float32 as most posts were saying it is the only way to fix the error - However I got the same the very same error as I got before I used that -

tensorflow.python.framework.errors_impl.NotFoundError: Could not find valid device for node.
Node:{{node MatMul}}

And by the way it also printed out the device and dtypes for some particular layer or all the layers. I do not know of what it did but here it is -

All kernels registered for op MatMul :
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_COMPLEX64]
  device='GPU'; T in [DT_COMPLEX128]
  device='GPU'; T in [DT_HALF]
  device='CPU'; label='eigen'; T in [DT_FLOAT]
  device='CPU'; label='eigen'; T in [DT_DOUBLE]
  ..........
  ..........
  ..........
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_COMPLEX128]
  device='GPU'; label='cublas'; T in [DT_FLOAT]
  device='GPU'; label='cublas'; T in [DT_DOUBLE]
  device='GPU'; label='cublas'; T in [DT_COMPLEX64]
  device='GPU'; label='cublas'; T in [DT_COMPLEX128]
  device='GPU'; label='cublas'; T in [DT_HALF]
 [Op:MatMul] name: dense_1/Tensordot/MatMul/

As we see here we are having some parameters on gpu and some parameters on cpu. Why is it doing that?
Also the dtypes of the parameters is different. I am not sure if they can be or they should not be.

As far as I know I think it errored me out as things on gpu cannot interact with things on cpu. So why is it keeping my parameters on different devices.

Edit:

Here is a link to the full code - https://pastebin.com/sd8L2xAM Here is also the full error I got if you want to find at what line it is occuring - https://pastebin.com/C9Dy5NxL

TF_Support · Accepted Answer

It seems that the error is a generic one for mismatch in type.

On the function below, the problem is that you are passing the state type that is a NumPy array. that results in a mismatch in a type. Since the self.model.advantage is a dense layer, therefore, casting the state from NumPy array to Tensor will solve the issue in types.

def choose_action(self, state):
      if np.random.random() < self.epsilon:
          action = np.random.choice(env.action_space.n)
      else: # we exploit
          print(type(state)) ##nd array which is mismatch
          state = tf.cast(state, dtype=tf.float32) ## cast the state to a tensor
          actions = self.model.advantage(state)
          action = np.argmax(actions, axis=1)
      return action

Tensorflow cannot find valid device for node. even after casting to float32

Answers (1)

Related Questions