Autokeras consumes all GPU after Trial 1

Question

I faced a problem with autokeras while running an example from the book. The task was to generate architecture for model trained with MNIST dataset ("hello world" difficulty task for autokeras). Also I have an issues using laptop GPU, and I have to add some extra code to enable explicit GPU usage.

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.python.keras.utils.data_utils import Sequence
import autokeras as ak

###### My special code here ##############
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
##########################################

(x_train, y_train), (x_test, y_test) = mnist.load_data()

clf = ak.ImageClassifier(
    overwrite=True,
    max_trials=10)

##########################################
with tf.device('/gpu:0'):
##########################################
    clf.fit(x_train, y_train, epochs=2)

Output (epochs equals 2 to get faster results):

    Trial 1 Complete [00h 00m 20s]
    val_loss: 0.058981552720069885
    
    Best val_loss So Far: 0.058981552720069885
    Total elapsed time: 00h 00m 20s
    
    Search: Running Trial #2
    
    Hyperparameter      |Value     |Best Value So Far   
    image_block_1/block_type|resnet    |vanilla             
    image_block_1/normalize|True      |True                
    image_block_1/augment|True      |False               
    image_block_1/image_augmentation_1/horizontal_flip|True      |None                
    image_block_1/image_augmentation_1/vertical_flip|False     |None                
    image_block_1/image_augmentation_1/contrast_factor|0.0       |None                
    image_block_1/image_augmentation_1/rotation_factor|0.0       |None                
    image_block_1/image_augmentation_1/translation_factor|0.1       |None                
    image_block_1/image_augmentation_1/zoom_factor|0.0       |None                
    image_block_1/res_net_block_1/pretrained|True      |None                
    image_block_1/res_net_block_1/version|resnet50  |None                
    image_block_1/res_net_block_1/trainable|True      |None                
    image_block_1/res_net_block_1/imagenet_size|True      |None                
    classification_head_1/spatial_reduction_1/reduction_type|global_avg|flatten             
    classification_head_1/dropout|0         |0.5                 
    optimizer           |adam      |adam                
    learning_rate       |1e-05     |0.001               
    
    Epoch 1/2
       2/1500 [..............................] - ETA: 5:31 - loss: 2.4616 - accuracy: 0.1562WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.1631s vs `on_train_batch_end` time: 0.2793s). Check your callbacks.
       3/150Trial 1 Complete [00h 00m 20s]
    val_loss: 0.058981552720069885
    
    Best val_loss So Far: 0.058981552720069885
    Total elapsed time: 00h 00m 20s
    
    Search: Running Trial #2
    
    Hyperparameter      |Value     |Best Value So Far   
    image_block_1/block_type|resnet    |vanilla             
    image_block_1/normalize|True      |True                
    image_block_1/augment|True      |False               
    image_block_1/image_augmentation_1/horizontal_flip|True      |None                
    image_block_1/image_augmentation_1/vertical_flip|False     |None                
    image_block_1/image_augmentation_1/contrast_factor|0.0       |None                
    image_block_1/image_augmentation_1/rotation_factor|0.0       |None                
    image_block_1/image_augmentation_1/translation_factor|0.1       |None                
    image_block_1/image_augmentation_1/zoom_factor|0.0       |None                
    image_block_1/res_net_block_1/pretrained|True      |None                
    image_block_1/res_net_block_1/version|resnet50  |None                
    image_block_1/res_net_block_1/trainable|True      |None                
    image_block_1/res_net_block_1/imagenet_size|True      |None                
    classification_head_1/spatial_reduction_1/reduction_type|global_avg|flatten             
    classification_head_1/dropout|0         |0.5                 
    optimizer           |adam      |adam                
    learning_rate       |1e-05     |0.001               
    
    Epoch 1/2
       2/1500 [..............................] - ETA: 5:31 - loss: 2.4616 - accuracy: 0.1562WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.1631s vs `on_train_batch_end` time: 0.2793s). Check your callbacks.
       3/1500 [..............................] - ETA: 7:11 - loss: 2.4400 - accuracy: 0.1667
    
    ---------------------------------------------------------------------------
    ResourceExhaustedError                    Traceback (most recent call last)
     in 
          1 with tf.device('/gpu:0'):
    ----> 2     clf.fit(x_train, y_train, epochs=2)
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/autokeras/tasks/image.py in fit(self, x, y, epochs, callbacks, validation_split, validation_data, **kwargs)
        152             **kwargs: Any arguments supported by keras.Model.fit.
        153         """
    --> 154         super().fit(
        155             x=x,
        156             y=y,
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/autokeras/auto_model.py in fit(self, x, y, batch_size, epochs, callbacks, validation_split, validation_data, **kwargs)
        277         )
        278 
    --> 279         self.tuner.search(
        280             x=dataset,
        281             epochs=epochs,
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/autokeras/engine/tuner.py in search(self, epochs, callbacks, fit_on_val_data, **fit_kwargs)
        136         self.oracle.update_space(hp)
        137 
    --> 138         super().search(epochs=epochs, callbacks=new_callbacks, **fit_kwargs)
        139 
        140         # Train the best model use validation data.
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/kerastuner/engine/base_tuner.py in search(self, *fit_args, **fit_kwargs)
        129 
        130             self.on_trial_begin(trial)
    --> 131             self.run_trial(trial, *fit_args, **fit_kwargs)
        132             self.on_trial_end(trial)
        133         self.on_search_end()
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/kerastuner/engine/tuner.py in run_trial(self, trial, *fit_args, **fit_kwargs)
        151         self._on_train_begin(model, trial.hyperparameters,
        152                              *fit_args, **copied_fit_kwargs)
    --> 153         model.fit(*fit_args, **copied_fit_kwargs)
        154 
        155     def _on_train_begin(model, hp, *fit_args, **fit_kwargs):
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
        106   def _method_wrapper(self, *args, **kwargs):
        107     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
    --> 108       return method(self, *args, **kwargs)
        109 
        110     # Running inside `run_distribute_coordinator` already.
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
       1096                 batch_size=batch_size):
       1097               callbacks.on_train_batch_begin(step)
    -> 1098               tmp_logs = train_function(iterator)
       1099               if data_handler.should_sync:
       1100                 context.async_wait()
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
        778       else:
        779         compiler = "nonXla"
    --> 780         result = self._call(*args, **kwds)
        781 
        782       new_tracing_count = self._get_tracing_count()
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
        805       # In this case we have created variables on the first call, so we run the
        806       # defunned version which is guaranteed to never create variables.
    --> 807       return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
        808     elif self._stateful_fn is not None:
        809       # Release the lock early so that multiple threads can perform the call
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
       2827     with self._lock:
       2828       graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
    -> 2829     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
       2830 
       2831   @property
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _filtered_call(self, args, kwargs, cancellation_manager)
       1841       `args` and `kwargs`.
       1842     """
    -> 1843     return self._call_flat(
       1844         [t for t in nest.flatten((args, kwargs), expand_composites=True)
       1845          if isinstance(t, (ops.Tensor,
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
       1921         and executing_eagerly):
       1922       # No tape is watching; skip to running the function.
    -> 1923       return self._build_call_outputs(self._inference_function.call(
       1924           ctx, args, cancellation_manager=cancellation_manager))
       1925     forward_backward = self._select_forward_and_backward_functions(
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
        543       with _InterpolateFunctionError(self):
        544         if cancellation_manager is None:
    --> 545           outputs = execute.execute(
        546               str(self.signature.name),
        547               num_outputs=self._num_outputs,
    
    ~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
         57   try:
         58     ctx.ensure_initialized()
    ---> 59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
         60                                         inputs, attrs, num_outputs)
         61   except core._NotOkStatusException as e:
    
    ResourceExhaustedError:  OOM when allocating tensor with shape[65536] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[node functional_1/global_average_pooling2d/Mean (defined at /home/biowar/anaconda3/envs/ML/lib/python3.8/site-packages/kerastuner/engine/tuner.py:153) ]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
     [Op:__inference_train_function_37301]
    
    Function call stack:
    train_function

0 [..............................] - ETA: 7:11 - loss: 2.4400 - accuracy: 0.1667

---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
 in 
      1 with tf.device('/gpu:0'):
----> 2     clf.fit(x_train, y_train, epochs=2)

~/anaconda3/envs/ML/lib/python3.8/site-packages/autokeras/tasks/image.py in fit(self, x, y, epochs, callbacks, validation_split, validation_data, **kwargs)
    152             **kwargs: Any arguments supported by keras.Model.fit.
    153         """
--> 154         super().fit(
    155             x=x,
    156             y=y,

~/anaconda3/envs/ML/lib/python3.8/site-packages/autokeras/auto_model.py in fit(self, x, y, batch_size, epochs, callbacks, validation_split, validation_data, **kwargs)
    277         )
    278 
--> 279         self.tuner.search(
    280             x=dataset,
    281             epochs=epochs,

~/anaconda3/envs/ML/lib/python3.8/site-packages/autokeras/engine/tuner.py in search(self, epochs, callbacks, fit_on_val_data, **fit_kwargs)
    136         self.oracle.update_space(hp)
    137 
--> 138         super().search(epochs=epochs, callbacks=new_callbacks, **fit_kwargs)
    139 
    140         # Train the best model use validation data.

~/anaconda3/envs/ML/lib/python3.8/site-packages/kerastuner/engine/base_tuner.py in search(self, *fit_args, **fit_kwargs)
    129 
    130             self.on_trial_begin(trial)
--> 131             self.run_trial(trial, *fit_args, **fit_kwargs)
    132             self.on_trial_end(trial)
    133         self.on_search_end()

~/anaconda3/envs/ML/lib/python3.8/site-packages/kerastuner/engine/tuner.py in run_trial(self, trial, *fit_args, **fit_kwargs)
    151         self._on_train_begin(model, trial.hyperparameters,
    152                              *fit_args, **copied_fit_kwargs)
--> 153         model.fit(*fit_args, **copied_fit_kwargs)
    154 
    155     def _on_train_begin(model, hp, *fit_args, **fit_kwargs):

~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
    106   def _method_wrapper(self, *args, **kwargs):
    107     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
--> 108       return method(self, *args, **kwargs)
    109 
    110     # Running inside `run_distribute_coordinator` already.

~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1096                 batch_size=batch_size):
   1097               callbacks.on_train_batch_begin(step)
-> 1098               tmp_logs = train_function(iterator)
   1099               if data_handler.should_sync:
   1100                 context.async_wait()

~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    778       else:
    779         compiler = "nonXla"
--> 780         result = self._call(*args, **kwds)
    781 
    782       new_tracing_count = self._get_tracing_count()

~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    805       # In this case we have created variables on the first call, so we run the
    806       # defunned version which is guaranteed to never create variables.
--> 807       return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
    808     elif self._stateful_fn is not None:
    809       # Release the lock early so that multiple threads can perform the call

~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   2827     with self._lock:
   2828       graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 2829     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   2830 
   2831   @property

~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _filtered_call(self, args, kwargs, cancellation_manager)
   1841       `args` and `kwargs`.
   1842     """
-> 1843     return self._call_flat(
   1844         [t for t in nest.flatten((args, kwargs), expand_composites=True)
   1845          if isinstance(t, (ops.Tensor,

~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1921         and executing_eagerly):
   1922       # No tape is watching; skip to running the function.
-> 1923       return self._build_call_outputs(self._inference_function.call(
   1924           ctx, args, cancellation_manager=cancellation_manager))
   1925     forward_backward = self._select_forward_and_backward_functions(

~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    543       with _InterpolateFunctionError(self):
    544         if cancellation_manager is None:
--> 545           outputs = execute.execute(
    546               str(self.signature.name),
    547               num_outputs=self._num_outputs,

~/anaconda3/envs/ML/lib/python3.8/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57   try:
     58     ctx.ensure_initialized()
---> 59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:

ResourceExhaustedError:  OOM when allocating tensor with shape[65536] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node functional_1/global_average_pooling2d/Mean (defined at /home/biowar/anaconda3/envs/ML/lib/python3.8/site-packages/kerastuner/engine/tuner.py:153) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_train_function_37301]

Function call stack:
train_function

Output of nvidia-smi (during Trial 1):

Every 0,5s: nvidia-smi                                                   Nitro5: Sun Aug 30 12:59:30 2020

Sun Aug 30 12:59:31 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.57       Driver Version: 450.57       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 165...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P0    32W /  N/A |   1101MiB /  3911MiB |     41%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1691      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      2362      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      7530      C   ...conda3/envs/ML/bin/python      251MiB |
|    0   N/A  N/A     37376      C   ...conda3/envs/ML/bin/python      837MiB |
+-----------------------------------------------------------------------------+

Output of nvidia-smi (after Trial 2 started):

Every 0,5s: nvidia-smi                                                   Nitro5: Sun Aug 30 12:58:02 2020

Sun Aug 30 12:58:02 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.57       Driver Version: 450.57       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 165...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   41C    P8     1W /  N/A |   3885MiB /  3911MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1691      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      2362      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      7530      C   ...conda3/envs/ML/bin/python      251MiB |
|    0   N/A  N/A     35239      C   ...conda3/envs/ML/bin/python     3621MiB |
+-------------------------------------------------------------------------

How can I modify my code to prevent using 100% of my GPU after successful Trial 1?

Autokeras consumes all GPU after Trial 1

Answers (0)

Related Questions