Keras-tuner gets stuck after arbitrary number of trials using WSL

Question

I recently had to switch to using WSL 2 in order to enable GPU computing with Tensorflow. I had a piece of code working on windows (CPU) that tunes three hyperparameters using keras-tuner. However, when switching to WSL 2, suddenly, after X trials, the tuning stops:

Sometimes it gets stuck after 3 trials, sometimes 2. It doesn't crash, it just stays at epoch 1/3, hence no error message. This is the tuning code (which worked on windows).

tuner = kt.Hyperband(build_tune_model,
                     objective='val_accuracy',
                     max_epochs=25,
                     factor=3,
                     directory='keras_tuner',
                     project_name='hyperband_tune')

early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
tensorboard_callback = tf.keras.callbacks.TensorBoard("/tmp/tb_logs")

tuner.search(
    ds_train,
    epochs=400,
    validation_data=ds_valid,
    steps_per_epoch=train_size // BATCH_SIZE,
    validation_steps=valid_size,
    callbacks=[early_stopping, tensorboard_callback]
    )

Things I have tried:

Waiting for an hour (previous trials took less than 4 min to complete)
Using a different tuners (Bayesian optimization and HyperBand)
Checking performance stats of PC. It seems the CPU is being used (53%), while the GPU is completely unused (makes sense as its not starting the epoch). The CPU usage drops when interrupting the kernel.

Extra info:

I use Python 3.11.3.
I installed tensorflow with python3.11 -m pip install tensorflow[and-cuda] which gave no error messages. This installed tensorflow 2.17.0. Before installing, I had removed any existing CUDA installations on my pc, just to be sure.
I installed WSL 2 using wsl --install. I use the default Ubuntu distro.

Worth noting: I get the following warnings when loading Tensorflow, that I did not get before switching over to WSL 2:

Could one of these warnings be the issue? Also, I use Jupyter Notebook, I will try to run my code from the terminal and see if that works.

Any help I would greatly appreciate. If there is any info missing, please let me know. Thank you in advance.

Keras-tuner gets stuck after arbitrary number of trials using WSL

Answers (0)

Related Questions