3D CNN using keras-tensorflow on pycharm ( Process finished with exit code 137 (interrupted by signal 9: SIGKILL) )

Question

I'm doing a 3D CNN to classify LUNA16 data set (CT scan data set), I'm using keras-tensorflow on pycharm.

I'm following this code https://github.com/keras-team/keras-io/blob/master/examples/vision/3D_image_classification.py

and I just modified it to fit my (*.mhd) data (this is whats I'm running now) https://github.com/Mustafa-MS/3D-CNN-LUNA16/blob/main/3DCNN.py

Each time I run the code a different error came up and stop the process! but all errors about memory.

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
Out of memory
W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 3355443200 exceeds 10% of free system memory.
W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 369098752 exceeds 10% of free system memory.
W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 3355443200 exceeds 10% of free system memory.

My model summary is

    _________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 128, 128, 64, 1)] 0         
_________________________________________________________________
conv3d (Conv3D)              (None, 126, 126, 62, 64)  1792      
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 63, 63, 31, 64)    0         
_________________________________________________________________
batch_normalization (BatchNo (None, 63, 63, 31, 64)    256       
_________________________________________________________________
conv3d_1 (Conv3D)            (None, 61, 61, 29, 64)    110656    
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 30, 30, 14, 64)    0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 30, 30, 14, 64)    256       
_________________________________________________________________
conv3d_2 (Conv3D)            (None, 28, 28, 12, 128)   221312    
_________________________________________________________________
max_pooling3d_2 (MaxPooling3 (None, 14, 14, 6, 128)    0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 14, 14, 6, 128)    512       
_________________________________________________________________
conv3d_3 (Conv3D)            (None, 12, 12, 4, 256)    884992    
_________________________________________________________________
max_pooling3d_3 (MaxPooling3 (None, 6, 6, 2, 256)      0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 6, 6, 2, 256)      1024      
_________________________________________________________________
global_average_pooling3d (Gl (None, 256)               0         
_________________________________________________________________
dense (Dense)                (None, 512)               131584    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 513       
=================================================================
Total params: 1,352,897
Trainable params: 1,351,873
Non-trainable params: 1,024

The dimension of each CT scan is (128, 128, 64, 1)

The shape of training and validating is
xtrain = (800, 128, 128, 64) / xval = (88, 128, 128, 64) / ytrain = (800,) / yval = (88,)

The batch size = 2

I'm monitoring my model using wandb you can check it out here https://wandb.ai/mustafa-ms/monitor-gpu?workspace=user-mustafa-ms
It shows that the model is consuming 100% of each system memory and gpu memory just before it stopped working.
picture of gpu memory Allocated %
picture of system memory allocated %
I know there are tons of answers about the same problem but none of it is fixing my problem.
The CNN is not big, the batch is very small 2 only, and my data is only 888 CT scan! My pc have 32 gb memory and RTX 2080ti gpu.
full log is here

import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['/home/mustafa/home/mustafa/project/LUNAMASK', '/home/mustafa/home/mustafa/project/LUNAMASK'])
PyDev console: starting.
Python 3.8.7 (default, Dec 21 2020, 20:10:35) 
[GCC 7.5.0] on linux
runfile('/home/mustafa/home/mustafa/project/LUNAMASK/3DCNN.py', wdir='/home/mustafa/home/mustafa/project/LUNAMASK')
2021-02-02 05:40:34.999468: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
wandb: Currently logged in as: mustafa-ms (use `wandb login --relogin` to force relogin)
2021-02-02 05:40:37.643336: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
wandb: Tracking run with wandb version 0.10.15
wandb: Syncing run clean-violet-3
wandb: ⭐️ View project at https://wandb.ai/mustafa-ms/monitor-gpu
wandb: 🚀 View run at https://wandb.ai/mustafa-ms/monitor-gpu/runs/4y03vu5s
wandb: Run data is saved locally in /home/mustafa/home/mustafa/project/LUNAMASK/wandb/run-20210202_054036-4y03vu5s
wandb: Run `wandb offline` to turn off syncing.
y train length  800
y test length  88
xtrain =  (800, 128, 128, 64)
xval =  (88, 128, 128, 64)
ytrain =  (800,)
yval = 2021-02-02 08:27:05.599099: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
 (88,)
2021-02-02 08:27:05.606801: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-02-02 08:27:05.657391: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.658293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:09:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.605GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2021-02-02 08:27:05.658325: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-02-02 08:27:05.667884: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-02-02 08:27:05.667982: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-02-02 08:27:05.674032: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-02-02 08:27:05.676356: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-02-02 08:27:05.684058: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-02-02 08:27:05.686346: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-02-02 08:27:05.687068: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-02-02 08:27:05.687204: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.688185: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.689043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-02-02 08:27:05.690061: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-02 08:27:05.690188: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.691084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:09:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.605GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2021-02-02 08:27:05.691117: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-02-02 08:27:05.691137: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-02-02 08:27:05.691152: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-02-02 08:27:05.691165: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-02-02 08:27:05.691179: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-02-02 08:27:05.691192: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-02-02 08:27:05.691205: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-02-02 08:27:05.691218: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-02-02 08:27:05.691292: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.692206: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.693051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-02-02 08:27:05.693086: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-02-02 08:27:06.001440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-02 08:27:06.001467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-02-02 08:27:06.001473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-02-02 08:27:06.001663: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:06.002169: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:06.002643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:06.003097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9508 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:09:00.0, compute capability: 7.5)
2021-02-02 08:27:06.004312: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 3355443200 exceeds 10% of free system memory.
2021-02-02 08:27:06.983079: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 369098752 exceeds 10% of free system memory.
2021-02-02 08:27:07.406900: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 3355443200 exceeds 10% of free system memory.
2021-02-02 08:27:09.210752: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-02-02 08:27:09.229323: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3699750000 Hz
Dimension of the CT scan is: (128, 128, 64, 1)
Model: "3dcnn"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 128, 128, 64, 1)] 0         
_________________________________________________________________
conv3d (Conv3D)              (None, 126, 126, 62, 64)  1792      
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 63, 63, 31, 64)    0         
_________________________________________________________________
batch_normalization (BatchNo (None, 63, 63, 31, 64)    256       
_________________________________________________________________
conv3d_1 (Conv3D)            (None, 61, 61, 29, 64)    110656    
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 30, 30, 14, 64)    0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 30, 30, 14, 64)    256       
_________________________________________________________________
conv3d_2 (Conv3D)            (None, 28, 28, 12, 128)   221312    
_________________________________________________________________
max_pooling3d_2 (MaxPooling3 (None, 14, 14, 6, 128)    0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 14, 14, 6, 128)    512       
_________________________________________________________________
conv3d_3 (Conv3D)            (None, 12, 12, 4, 256)    884992    
_________________________________________________________________
max_pooling3d_3 (MaxPooling3 (None, 6, 6, 2, 256)      0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 6, 6, 2, 256)      1024      
_________________________________________________________________
global_average_pooling3d (Gl (None, 256)               0         
_________________________________________________________________
dense (Dense)                (None, 512)               131584    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 513       
=================================================================
Total params: 1,352,897
Trainable params: 1,351,873
Non-trainable params: 1,024
_________________________________________________________________
2021-02-02 08:27:26.194010: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 369098752 exceeds 10% of free system memory.
2021-02-02 08:27:26.397041: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 3355443200 exceeds 10% of free system memory.
Epoch 1/5
2021-02-02 08:27:30.705650: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-02-02 08:27:31.247841: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-02-02 08:27:31.879674: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
400/400 - 105s - loss: 0.6529 - acc: 0.6325 - val_loss: 0.8511 - val_acc: 0.6705
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

Mustafa Mahmood · Accepted Answer

Right now the code is running flawlessly on google colab.

I think the limitation was with the GPU (my gpu is RTX 2080ti) vs google colab gpu Nvidia T4.

I just preprocessed the data and saved it as a numpy array, then uploaded the arrays to google colab, and run the code after preprocessing. Now everything is working fine!

3D CNN using keras-tensorflow on pycharm ( Process finished with exit code 137 (interrupted by signal 9: SIGKILL) )

Answers (1)

Related Questions