Reputation: 77
I want to use mlflow to track the development of a TensorFlow model. How do I log the loss at each epoch? I have written the following code:
mlflow.set_tracking_uri(tracking_uri)
mlflow.set_experiment("/deep_learning")
with mlflow.start_run():
mlflow.log_param("batch_size", batch_size)
mlflow.log_param("learning_rate", learning_rate)
mlflow.log_param("epochs", epochs)
mlflow.log_param("Optimizer", opt)
mlflow.log_metric("train_loss", train_loss)
mlflow.log_metric("val_loss", val_loss)
mlflow.log_metric("test_loss", test_loss)
mlflow.log_metric("test_mse", test_mse)
mlflow.log_artifacts("./model")
If I change the train_loss and val_loss to
train_loss = history.history['loss']
val_loss = history.history['val_loss']
I get the following error:
mlflow.exceptions.MlflowException: Got invalid value [12.041399002075195] for metric 'train_loss' (timestamp=1649783654667). Please specify value as a valid double (64-bit floating point)
How to I save the the loss and the val_loss at all epochs, so I can visualise a learning curve within mlflow?
Upvotes: 1
Views: 3279
Reputation: 24049
As you can read here. You can use mlflow.tensorflow.autolog()
and this, (from doc):
Enables (or disables) and configures autologging from Keras to MLflow. Autologging captures the following information:
fit() or fit_generator() parameters; optimizer name; learning rate; epsilon ...
For example:
# !pip install mlflow
import tensorflow as tf
import mlflow
import numpy as np
X_train = np.random.rand(100,100)
y_train = np.random.randint(0,10,100)
model = tf.keras.Sequential()
model.add(tf.keras.Input(100,))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer='Adam',
metrics=['accuracy'])
model.summary()
mlflow.tensorflow.autolog()
history = model.fit(X_train, y_train, epochs=100, batch_size=50)
Or as you mention in the comment you can use mlflow.set_tracking_uri()
like below:
mlflow.set_tracking_uri('http://127.0.0.1:5000')
tracking_uri = mlflow.get_tracking_uri()
with mlflow.start_run(run_name='PARENT_RUN') as parent_run:
batch_size=50
history = model.fit(X_train, y_train, epochs=2, batch_size=batch_size)
mlflow.log_param("batch_size", batch_size)
For getting results:
!mlflow ui
Output:
[....] [...] [INFO] Starting gunicorn 20.1.0
[....] [...] [INFO] Listening at: http://127.0.0.1:5000 (****)
[....] [...] [INFO] Using worker: sync
[....] [...] [INFO] Booting worker with pid: ****
Upvotes: 1