I have a computer with few NVidia GPU, use packet 'segmentation_models' and build NN on the base of Unet: import segmentation_models as sm import keras.backend as K from keras import optimizers from keras.utils import multi_gpu_model lr = 2e-4 NUM_GPUS = 3 learning_rate = lr * NUM_GPUS adam = optimizers.Adam(lr=learning_rate) def dice_coef(y_true, y_pred, smooth=1): y_true_f = K.flatten(y_true) y_pred_f = K.flatten(y_pred) intersection = K.sum(y_true_f * y_pred_f) return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth) model = sm.Unet('efficientnetb3', encoder_weights='imagenet', classes=4, activation='softmax', encoder_freeze=False) parallel_model = multi_gpu_model(model, gpus=NUM_GPUS) model = parallel_model model.compile(adam, 'categorical_crossentropy', [dice_coef]) history = model.fit_generator( generator=train_gen, steps_per_epoch=len(train_gen), \ validation_data=validation_gen, \ epochs=50, callbacks=[clr, checkpoints, csv_logger], initial_epoch=0) after training I save weights for future using in cpu-mode: single_gpu_model = model.layers[-2] single_gpu_model.save(single_proc_model_path_1_kernel) And I try to work with theese weights: import keras model1 = keras.models.load_model(single_proc_model_path_1_kernel) ... pr_mask = self.model1.predict(img_exp) Machine for NN training: Ubuntu 16.04.4 LTS, 3 x K80 GPU; python 3.6.7, tensorflow 1.12.0 - all code works here. Win10 with 1 GeForce GTX 1080 ; python 3.7.3, tensorflow-gpu 1.13.1 - code works here too. Win10 without NVidia GPU; tensorflow-gpu 1.13.1 - ERROR when loading model: tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected docker with Ubuntu 18.04.3 LTS; python 3.6.9, tensorflow 2.1.0. Error when loading model: tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Segmentation Models: using keras framework. tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303) I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (b36a4cf2df2e): /proc/driver/nvidia/version does not exist What should I change to force code to work on a machine with CPUs ony?

Reputation: 117

Cuda driver errors on the machine without GPU while loading model

I have a computer with few NVidia GPU, use packet 'segmentation_models' and build NN on the base of Unet:

import segmentation_models as sm
import keras.backend as K
from keras import optimizers
from keras.utils import multi_gpu_model

lr = 2e-4
NUM_GPUS = 3
learning_rate = lr * NUM_GPUS

adam = optimizers.Adam(lr=learning_rate)

def dice_coef(y_true, y_pred, smooth=1):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

model = sm.Unet('efficientnetb3', encoder_weights='imagenet', classes=4, activation='softmax', encoder_freeze=False)
parallel_model = multi_gpu_model(model, gpus=NUM_GPUS)
model = parallel_model
model.compile(adam, 'categorical_crossentropy', [dice_coef])
history = model.fit_generator(
        generator=train_gen, steps_per_epoch=len(train_gen), \
        validation_data=validation_gen, \
        epochs=50, callbacks=[clr, checkpoints, csv_logger],
        initial_epoch=0)

after training I save weights for future using in cpu-mode:

single_gpu_model = model.layers[-2]
single_gpu_model.save(single_proc_model_path_1_kernel)

And I try to work with theese weights:

import keras
model1 = keras.models.load_model(single_proc_model_path_1_kernel)
...
pr_mask = self.model1.predict(img_exp)

Machine for NN training: Ubuntu 16.04.4 LTS, 3 x K80 GPU; python 3.6.7, tensorflow 1.12.0 - all code works here.
Win10 with 1 GeForce GTX 1080; python 3.7.3, tensorflow-gpu 1.13.1 - code works here too.
Win10 without NVidia GPU; tensorflow-gpu 1.13.1 - ERROR when loading model:

tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

docker with Ubuntu 18.04.3 LTS; python 3.6.9, tensorflow 2.1.0.
Error when loading model:

tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Segmentation Models: using keras framework. tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303) I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (b36a4cf2df2e): /proc/driver/nvidia/version does not exist

What should I change to force code to work on a machine with CPUs ony?

Upvotes: 3

Answers (2)

Valentyn Vovk

Reputation: 117

Tensorflow 1.15 resolved all the problems.

Upvotes: 0

DeusXMachina

Reputation: 1399

You can try setting the environment variable CUDA_VISIBLE_DEVICES to either blank or emptystring "", or possibly -1.

Otherwise you'll need to tell the tensorflow backend to use CPU only.

Note that keras multi_gpu_model is deprecated and you should alter your code to use tf.distribute.MirroredStrategy instead. I haven't personally worked with it but I imagine this new API is designed to work more seamlessly across GPU/CPU situations like yours.

Upvotes: -1

Cuda driver errors on the machine without GPU while loading model

Answers (2)

Related Questions