Mustafa Mahmood
Mustafa Mahmood

Reputation: 1

3D CNN using keras-tensorflow on pycharm ( Process finished with exit code 137 (interrupted by signal 9: SIGKILL) )

I'm doing a 3D CNN to classify LUNA16 data set (CT scan data set), I'm using keras-tensorflow on pycharm.

I'm following this code

and I just modified it to fit my (*.mhd) data (this is whats I'm running now)

Each time I run the code a different error came up and stop the process! but all errors about memory.

My model summary is

Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 128, 128, 64, 1)] 0         
conv3d (Conv3D)              (None, 126, 126, 62, 64)  1792      
max_pooling3d (MaxPooling3D) (None, 63, 63, 31, 64)    0         
batch_normalization (BatchNo (None, 63, 63, 31, 64)    256       
conv3d_1 (Conv3D)            (None, 61, 61, 29, 64)    110656    
max_pooling3d_1 (MaxPooling3 (None, 30, 30, 14, 64)    0         
batch_normalization_1 (Batch (None, 30, 30, 14, 64)    256       
conv3d_2 (Conv3D)            (None, 28, 28, 12, 128)   221312    
max_pooling3d_2 (MaxPooling3 (None, 14, 14, 6, 128)    0         
batch_normalization_2 (Batch (None, 14, 14, 6, 128)    512       
conv3d_3 (Conv3D)            (None, 12, 12, 4, 256)    884992    
max_pooling3d_3 (MaxPooling3 (None, 6, 6, 2, 256)      0         
batch_normalization_3 (Batch (None, 6, 6, 2, 256)      1024      
global_average_pooling3d (Gl (None, 256)               0         
dense (Dense)                (None, 512)               131584    
dropout (Dropout)            (None, 512)               0         
dense_1 (Dense)              (None, 1)                 513       
Total params: 1,352,897
Trainable params: 1,351,873
Non-trainable params: 1,024

The dimension of each CT scan is (128, 128, 64, 1)

The shape of training and validating is
xtrain = (800, 128, 128, 64) / xval = (88, 128, 128, 64) / ytrain = (800,) / yval = (88,)

The batch size = 2

I'm monitoring my model using wandb you can check it out here
It shows that the model is consuming 100% of each system memory and gpu memory just before it stopped working.
picture of gpu memory Allocated %
picture of system memory allocated %
I know there are tons of answers about the same problem but none of it is fixing my problem.
The CNN is not big, the batch is very small 2 only, and my data is only 888 CT scan! My pc have 32 gb memory and RTX 2080ti gpu.
full log is here

import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['/home/mustafa/home/mustafa/project/LUNAMASK', '/home/mustafa/home/mustafa/project/LUNAMASK'])
PyDev console: starting.
Python 3.8.7 (default, Dec 21 2020, 20:10:35) 
[GCC 7.5.0] on linux
runfile('/home/mustafa/home/mustafa/project/LUNAMASK/', wdir='/home/mustafa/home/mustafa/project/LUNAMASK')
2021-02-02 05:40:34.999468: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
wandb: Currently logged in as: mustafa-ms (use `wandb login --relogin` to force relogin)
2021-02-02 05:40:37.643336: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
wandb: Tracking run with wandb version 0.10.15
wandb: Syncing run clean-violet-3
wandb: ⭐️ View project at
wandb: 🚀 View run at
wandb: Run data is saved locally in /home/mustafa/home/mustafa/project/LUNAMASK/wandb/run-20210202_054036-4y03vu5s
wandb: Run `wandb offline` to turn off syncing.
y train length  800
y test length  88
xtrain =  (800, 128, 128, 64)
xval =  (88, 128, 128, 64)
ytrain =  (800,)
yval = 2021-02-02 08:27:05.599099: I tensorflow/compiler/jit/] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-02 08:27:05.606801: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.657391: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.658293: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: 
pciBusID: 0000:09:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.605GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2021-02-02 08:27:05.658325: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.667884: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.667982: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.674032: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.676356: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.684058: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.686346: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.687068: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.687204: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.688185: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.689043: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2021-02-02 08:27:05.690061: I tensorflow/compiler/jit/] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-02 08:27:05.690188: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.691084: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: 
pciBusID: 0000:09:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.605GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2021-02-02 08:27:05.691117: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.691137: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.691152: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.691165: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.691179: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.691192: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.691205: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.691218: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:05.691292: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.692206: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:05.693051: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2021-02-02 08:27:05.693086: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:06.001440: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-02 08:27:06.001467: I tensorflow/core/common_runtime/gpu/]      0 
2021-02-02 08:27:06.001473: I tensorflow/core/common_runtime/gpu/] 0:   N 
2021-02-02 08:27:06.001663: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:06.002169: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:06.002643: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-02 08:27:06.003097: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9508 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:09:00.0, compute capability: 7.5)
2021-02-02 08:27:06.004312: W tensorflow/core/framework/] Allocation of 3355443200 exceeds 10% of free system memory.
2021-02-02 08:27:06.983079: W tensorflow/core/framework/] Allocation of 369098752 exceeds 10% of free system memory.
2021-02-02 08:27:07.406900: W tensorflow/core/framework/] Allocation of 3355443200 exceeds 10% of free system memory.
2021-02-02 08:27:09.210752: I tensorflow/compiler/mlir/] None of the MLIR optimization passes are enabled (registered 2)
2021-02-02 08:27:09.229323: I tensorflow/core/platform/profile_utils/] CPU Frequency: 3699750000 Hz
Dimension of the CT scan is: (128, 128, 64, 1)
Model: "3dcnn"
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 128, 128, 64, 1)] 0         
conv3d (Conv3D)              (None, 126, 126, 62, 64)  1792      
max_pooling3d (MaxPooling3D) (None, 63, 63, 31, 64)    0         
batch_normalization (BatchNo (None, 63, 63, 31, 64)    256       
conv3d_1 (Conv3D)            (None, 61, 61, 29, 64)    110656    
max_pooling3d_1 (MaxPooling3 (None, 30, 30, 14, 64)    0         
batch_normalization_1 (Batch (None, 30, 30, 14, 64)    256       
conv3d_2 (Conv3D)            (None, 28, 28, 12, 128)   221312    
max_pooling3d_2 (MaxPooling3 (None, 14, 14, 6, 128)    0         
batch_normalization_2 (Batch (None, 14, 14, 6, 128)    512       
conv3d_3 (Conv3D)            (None, 12, 12, 4, 256)    884992    
max_pooling3d_3 (MaxPooling3 (None, 6, 6, 2, 256)      0         
batch_normalization_3 (Batch (None, 6, 6, 2, 256)      1024      
global_average_pooling3d (Gl (None, 256)               0         
dense (Dense)                (None, 512)               131584    
dropout (Dropout)            (None, 512)               0         
dense_1 (Dense)              (None, 1)                 513       
Total params: 1,352,897
Trainable params: 1,351,873
Non-trainable params: 1,024
2021-02-02 08:27:26.194010: W tensorflow/core/framework/] Allocation of 369098752 exceeds 10% of free system memory.
2021-02-02 08:27:26.397041: W tensorflow/core/framework/] Allocation of 3355443200 exceeds 10% of free system memory.
Epoch 1/5
2021-02-02 08:27:30.705650: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:31.247841: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-02-02 08:27:31.879674: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
400/400 - 105s - loss: 0.6529 - acc: 0.6325 - val_loss: 0.8511 - val_acc: 0.6705
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

Upvotes: 0

Views: 525

Answers (1)

Mustafa Mahmood
Mustafa Mahmood

Reputation: 1

Right now the code is running flawlessly on google colab.

I think the limitation was with the GPU (my gpu is RTX 2080ti) vs google colab gpu Nvidia T4.

I just preprocessed the data and saved it as a numpy array, then uploaded the arrays to google colab, and run the code after preprocessing. Now everything is working fine!

Upvotes: 0

Related Questions