ChemBot
ChemBot

Reputation: 77

How to I track loss at epoch using mlflow/tensorflow?

I want to use mlflow to track the development of a TensorFlow model. How do I log the loss at each epoch? I have written the following code:

mlflow.set_tracking_uri(tracking_uri)

mlflow.set_experiment("/deep_learning")
with mlflow.start_run():
    mlflow.log_param("batch_size", batch_size)
    mlflow.log_param("learning_rate", learning_rate)
    mlflow.log_param("epochs", epochs)
    mlflow.log_param("Optimizer", opt)
    mlflow.log_metric("train_loss", train_loss)
    mlflow.log_metric("val_loss", val_loss)
    mlflow.log_metric("test_loss", test_loss)
    mlflow.log_metric("test_mse", test_mse)
    mlflow.log_artifacts("./model")

If I change the train_loss and val_loss to

train_loss = history.history['loss']
val_loss = history.history['val_loss']

I get the following error:

mlflow.exceptions.MlflowException: Got invalid value [12.041399002075195] for metric 'train_loss' (timestamp=1649783654667). Please specify value as a valid double (64-bit floating point)

How to I save the the loss and the val_loss at all epochs, so I can visualise a learning curve within mlflow?

Upvotes: 1

Views: 3279

Answers (1)

I'mahdi
I'mahdi

Reputation: 24049

As you can read here. You can use mlflow.tensorflow.autolog() and this, (from doc):

Enables (or disables) and configures autologging from Keras to MLflow. Autologging captures the following information:

fit() or fit_generator() parameters; optimizer name; learning rate; epsilon ...

For example:

# !pip install mlflow
import tensorflow as tf
import mlflow
import numpy as np


X_train = np.random.rand(100,100)
y_train = np.random.randint(0,10,100)
    

model = tf.keras.Sequential()
model.add(tf.keras.Input(100,))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))        
model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              optimizer='Adam', 
              metrics=['accuracy'])
model.summary()


mlflow.tensorflow.autolog()
history = model.fit(X_train, y_train, epochs=100, batch_size=50)

Or as you mention in the comment you can use mlflow.set_tracking_uri() like below:

mlflow.set_tracking_uri('http://127.0.0.1:5000')
tracking_uri = mlflow.get_tracking_uri()
with mlflow.start_run(run_name='PARENT_RUN') as parent_run:
    batch_size=50
    history = model.fit(X_train, y_train, epochs=2, batch_size=batch_size)
    mlflow.log_param("batch_size", batch_size)  

For getting results:

!mlflow ui

Output:

[....] [...] [INFO] Starting gunicorn 20.1.0
[....] [...] [INFO] Listening at: http://127.0.0.1:5000 (****)
[....] [...] [INFO] Using worker: sync
[....] [...] [INFO] Booting worker with pid: ****

enter image description here enter image description here

Upvotes: 1

Related Questions