Reputation: 35
I am trying to create an alert in CloudWatch when my training jobs in SageMaker are failing. Regarding here it seems that by default I should see in CloudWatch a JobsFailed
, JobsSucceeded
, .. metrics. But I am not seeing that in CloudWatch, I am only seeing CPU, disk usage and memory usage metrics.
I am trying to avoid to create a lamda in my own to capture something that should be there.
Maybe im missing some configuration. I have checked the IAM permissions of the role that is being used by the training jobs and it has activated AmazonSageMakerFullAccess
permission.
I haven't found a docu or wiki page that explains how should I set up Sagemaker to be able to watch those groudtruth metrics.
Do you have any idea about how could I proceed? thanks a lot
Update: I have set enable_sagemaker_metrics=True on my Estimator and added CloudWatchAgentAdminPolicy permissions to my IAM role and still not working.
Upvotes: 0
Views: 21