Reputation: 9198
I've implemented validation_epoch_end
to produce and log metrics, and when I run trainer.validate
, the metrics appear in my notebook.
However, when I run trainer.fit
, only the training metrics appear; not the validation ones.
The validation step is still being run (because the validation code calls a print
statement, which does appear), but the validation metrics don't appear, even though they're logged. Or, if they do appear, the next epoch immediately erases them, so that I can't see them.
(Likewise, tensorboard sees the validation metrics)
How can I see the validation epoch end metrics in a notebook, as each epoch occurs?
Upvotes: 5
Views: 4989
Reputation: 3426
You could do the following. Let's say you have the following LightningModule
:
class MNISTModel(LightningModule):
def __init__(self):
super().__init__()
self.l1 = torch.nn.Linear(28 * 28, 10)
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def training_step(self, batch, batch_nb):
x, y = batch
loss = F.cross_entropy(self(x), y)
# prog_bar=True will display the value on the progress bar statically for the last complete train epoch
self.log("train_loss", loss, on_step=False, on_epoch=True, prog_bar=True)
return loss
def validation_step(self, batch, batch_nb):
x, y = batch
loss = F.cross_entropy(self(x), y)
# prog_bar=True will display the value on the progress bar statically for the last complete validation epoch
self.log("val_loss", loss, on_step=False, on_epoch=True, prog_bar=True)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
The trick is to use prog_bar=True
in combination with on_step
and on_epoch
depending on when you want the update on the progress bar. So, in this case, when training:
# Train the model ⚡
trainer.fit(mnist_model, MNIST_dm)
you will see:
Epoch 4: 100% -------------------------- 939/939 [00:09<00:00, 94.51it/s, loss=0.636, v_num=4, val_loss=0.743, train_loss=0.726]
Where loss
will be updating each batch as it is the step loss. However, val_loss
and train_loss
will be static values that will only change after each validation or train epoch respectively.
Upvotes: 3