Reputation: 233
I am using a docker image which have all required files and then push it to aws ecr where i can use that image to pass to estimator. I have added the train.py file as entrypoint in dockerfile.
ENTRYPOINT ["python3", "-m","train"]
This works as required in local with docker run -it image
but when running training job i get the error.
Training - Training image download completed. Training in progress...usage: train.py [-h] [--epochs EPOCHS] [--learning_rate LEARNING_RATE]
[--max_sequence_length MAX_SEQUENCE_LENGTH]
[--train_batch_size TRAIN_BATCH_SIZE]
[--valid_batch_size VALID_BATCH_SIZE]
train.py: error: unrecognized arguments: train
The training job using sagemaker estimator:
estimator = sagemaker.estimator.Estimator(image, # docker image
role,
train_instance_count=1,
train_instance_type='ml.p2.xlarge',
output_path=output_path,
hyperparameters=hyperparameters,
sagemaker_session=session
)
Train.py main fun():
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', type=int, default=2)
parser.add_argument('--learning_rate', type=float, default=2e-5)
parser.add_argument('--max_sequence_length', type=int, default=512)
parser.add_argument('--train_batch_size', type=int, default=12)
parser.add_argument('--valid_batch_size', type=int, default=8)
# SageMaker environment variables.
#parser.add_argument('--hosts', type=str, default=os.environ['SM_HOSTS'])
#parser.add_argument('--current_host', type=str, default=os.environ['SM_CURRENT_HOST'])
# Parse command-line args and run main.
args = parser.parse_args()
# Get SageMaker host information from runtime environment variables
#sm_hosts = json.loads(args.hosts)
#sm_current_host = args.current_host
train(args)
From sagemaker doc i was able to find that image in training job runs as docker run image train
and when i tried to the same in local i got the same error.
Upvotes: 3
Views: 1696
Reputation: 2765
You don't need to define the ENTRYPOINT
at all. What I do is simply have a train
(with no file extension) file with my training code. Be sure to make it executable and put it in /opt/ml/code
. See the complete code here.
Upvotes: 2
Reputation: 325
Assuming train.py
is at root level or working directory of your docker,
Following code should solve the issue for you:
ENTRYPOINT ["python3", "train.py"]
Upvotes: 4