Ashwanth Karibindi
Ashwanth Karibindi

Reputation: 1

AWS Sagemaker T5 or huggingface Model training issue

I am trying to train a t5 conditional Generation model in Sagemaker, its running fine when I am passing the arguments directly in notebook but its not learning anything when I am passing estimator and train.py script, I followed the documentation provided by hugging face as well as AWS. But still we are facing issue it is saying training is completed and saving model with in 663 seconds what ever might be the size of dataset. Kindly give suggestions for this.

Upvotes: 0

Views: 109

Answers (1)

Gili Nachum
Gili Nachum

Reputation: 5578

Check Amazon CloudWatch logs to be able to tell what took place during training (train.py stdout/stderr). This utility can help with downloading logs to your local machine/notebook.

Upvotes: 0

Related Questions