Reputation: 21
Hi I am newbie to AWS Sagemaker, I am trying to deploying the custom time series model on sagemaker, so for that build a docker image using sagemaker terminal,But when i am trying to creating training job it showing some error.I am struggling with past four days, please any one could help me. Here my code:
lstm = sage.estimator.Estimator(image,
role, 1, 'ml.m4.xlarge',
output_path='s3://' + s3Bucket,
sagemaker_session=sess)
lstm.fit(upload_data)
Here my Error, I attached policy of ecr full access permissions to sagemaker Iam role and also account is in same region.
ClientErrorTraceback (most recent call last)
<ipython-input-48-1d7f3ff70f18> in <module>()
4 sagemaker_session=sess)
5
----> 6 lstm.fit(upload_data)
/home/ec2-user/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/sagemaker/estimator.pyc in fit(self, inputs, wait, logs, job_name, experiment_config)
472 self._prepare_for_training(job_name=job_name)
473
--> 474 self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config)
475 self.jobs.append(self.latest_training_job)
476 if wait:
/home/ec2-user/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/sagemaker/estimator.pyc in start_new(cls, estimator, inputs, experiment_config)
1036 train_args["enable_sagemaker_metrics"] = estimator.enable_sagemaker_metrics
1037
-> 1038 estimator.sagemaker_session.train(**train_args)
1039
1040 return cls(estimator.sagemaker_session, estimator._current_job_name)
/home/ec2-user/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/sagemaker/session.pyc in train(self, input_mode, input_config, role, job_name, output_config, resource_config, vpc_config, hyperparameters, stop_condition, tags, metric_definitions, enable_network_isolation, image, algorithm_arn, encrypt_inter_container_traffic, train_use_spot_instances, checkpoint_s3_uri, checkpoint_local_path, experiment_config, debugger_rule_configs, debugger_hook_config, tensorboard_output_config, enable_sagemaker_metrics)
588 LOGGER.info("Creating training-job with name: %s", job_name)
589 LOGGER.debug("train request: %s", json.dumps(train_request, indent=4))
--> 590 self.sagemaker_client.create_training_job(**train_request)
591
592 def process(
/home/ec2-user/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/botocore/client.pyc in _api_call(self, *args, **kwargs)
314 "%s() only accepts keyword arguments." % py_operation_name)
315 # The "self" in this scope is referring to the BaseClient.
--> 316 return self._make_api_call(operation_name, kwargs)
317
318 _api_call.__name__ = str(py_operation_name)
/home/ec2-user/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/botocore/client.pyc in _make_api_call(self, operation_name, api_params)
624 error_code = parsed_response.get("Error", {}).get("Code")
625 error_class = self.exceptions.from_code(error_code)
--> 626 raise error_class(parsed_response, operation_name)
627 else:
628 return parsed_response
ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: Cannot find repository: sagemaker-model in registry ID: 534860077983 Please check if your ECR repository exists and role arn:aws:iam::534860077983:role/service-role/AmazonSageMaker-ExecutionRole-20190508T215284 has proper pull permissions for SageMaker: ecr:BatchCheckLayerAvailability, ecr:BatchGetImage, ecr:GetDownloadUrlForLayer
Upvotes: 0
Views: 672
Reputation: 444
TL;DR: Seems like you're not providing the correct repository for the ECR image to the SageMaker estimator. Maybe the repository doesn't exist?
Also make sure that the repository's permissions are configured to allow the principal sagemaker.amazonaws.com
to do ecr:BatchCheckLayerAvailability, ecr:BatchGetImage, ecr:GetDownloadUrlForLayer
Upvotes: 0