Reputation: 1685
My GPU info is below.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti Off | 00000000:01:00.0 On | N/A |
| 34% 51C P0 2W / 38W | 1909MiB / 1993MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3492 C python 1467MiB |
| 0 7875 G ...yCharm-C/ch-0/193.5233.109/jbr/bin/java 2MiB |
| 0 30812 G /usr/lib/xorg/Xorg 163MiB |
| 0 31133 G kwin_x11 25MiB |
| 0 31137 G /usr/bin/krunner 1MiB |
| 0 31139 G /usr/bin/plasmashell 55MiB |
| 0 31536 G ...uest-channel-token=13296030830960435903 176MiB |
+-----------------------------------------------------------------------------+
When I run the mnist tutorial here: https://www.tensorflow.org/tutorials/quickstart/beginner
I received this error:
2019-12-10 00:27:06.891510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 115 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-12-10 00:27:06.894510: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 115.56M (121176064 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-12-10 00:27:22.271281: F ./tensorflow/core/kernels/random_op_gpu.h:227] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: out of memory
I am using TF-2 on Unbuntu. I have 2 questions: 1) My Ubuntu has 64G memory, and my GPU has about 2G memory. When it reported the error 'out of meomory', is it because the training only uses the GPU's memory, not the 64G?
2) How to solve this out of memory error?
Upvotes: 0
Views: 1935
Reputation: 123
The only way to solve it is NOT use GPU, your training will be slow, but at least it will work.
Upvotes: 1
Reputation: 15063
Yes, the training uses the GPU memory because you feed the data to the GPU when training.
The problem is that the video card that you are using has very little video-memory. 2GB VRAM are not enough for deep learning.
I recommend that you use at least a video card with 6 GB VRAM.
If switching to a better hardware is not attainable, you could opt for AWS(Amazon Web Services) or Google Colab to use video cards.
Upvotes: 1