Reputation: 1889
I am trying to run a net (convolution, highway, fc, rnn) which is too big for the GPU. Thus I am defining the device, globally, as "cpu". Still when executing the script, after building the model, when initializing the variables, the script throws a gpu error.
with tf.Session() as sess:
with tf.device("cpu:0"):
model = CNN_FC_LANGUAGE(sess, checkpoint_dir=FLAGS.checkpoint_dir,
char_embed_dim=FLAGS.char_embed_dim,
summaries_dir=FLAGS.summaries_dir,
feature_maps=eval(FLAGS.feature_maps),
kernels=eval(FLAGS.kernels),
batch_size=FLAGS.batch_size,
dropout_prob=FLAGS.dropout_prob,
forward_only=FLAGS.forward_only,
seq_length=FLAGS.seq_length,
prediction_starts=FLAGS.prediction_starts,
prediction_length=FLAGS.prediction_length,
use_char=FLAGS.use_char,
highway_layers=FLAGS.highway_layers,
rnn_size=FLAGS.rnn_size,
rnn_layer_depth=FLAGS.rnn_layer_depth,
use_batch_norm=FLAGS.use_batch_norm,
run_name=run_name,
data_dir=FLAGS.data_dir)
model.run(FLAGS.epoch, FLAGS.learning_rate, FLAGS.learning_rate_decay, FLAGS.net2net)
In all the used scripts a search for "gpu" does give 0 results. Also, when creating the model I print all the tensor names. The device is also printed. When searching for "gpu" here, also I get 0 results.
Still, when the script runs it throws a CUDA-error. But why would it allocate any memory on the GPU if device is explicitely set to CPU?
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2147483648
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 1932735232 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1932735232
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 1739461632 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1739461632
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 1565515520 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1565515520
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 1408964096 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1408964096
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 4294967296
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 4294967296
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 4294967296
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 4294967296
Killed
Any ideas? Thx
Edit: Also when building the graph tensorflow echos:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1050 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.468
pciBusID 0000:04:00.0
Total memory: 3.94GiB
Free memory: 3.64GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:04:00.0)
But why? I told it to only use the cpu, right?
Upvotes: 0
Views: 1099
Reputation: 126154
The GPU version of TensorFlow will always attempt to initialize the GPU runtime (including devices and allocators) if one is available and, as X3liF observes, the error you are seeing comes from allocating host (i.e. CPU) memory that can be accessed more efficiently in case you try to use the GPU.
To avoid using any GPU resources at all, you can set the CUDA_VISIBLE_DEVICES
environment variable when starting Python. Let's say your code is in a file called my_script.py
:
# An empty CUDA_VISIBLE_DEVICES will hide all GPUs from TensorFlow.
$ CUDA_VISIBLE_DEVICES="" python my_script.py
Upvotes: 4
Reputation: 1074
Pinned memory is allocated with a call to cudaMallocHost. this method doesn't allocate global GPU memory, the memory is allocated on the host side, but with some properties to allow faster copy through PCI-Express.
Moreover, cudaMallocHost needs contiguous memory, maybe your memory is fragmented into small sparse allocations and cudaMalloc fails.
Upvotes: 3