Reputation: 164
I'm trying to deploy my model (container) on AWS SageMaker. I've pushed the container to AWS ECR.
Then I use an AWS Lambda that basically runs create_training_job()
via the boto3
SageMaker client. It runs the container in train mode and puts the generated artifact to the S3. Like that:
sm = boto3.client('sagemaker')
sm.create_training_job(
TrainingJobName=full_job_name,
HyperParameters={
'general': json.dumps(
{
'environment': ENVIRONMENT,
'region': REGION,
'version': date_suffix,
'hyperparameter_tuning': training_params.get('hyperparameter_tuning', False),
'basket_analysis': training_params.get('basket_analysis', True),
'init_inventory_cache': training_params.get('init_inventory_cache', True),
}
),
'aws_profile': '***-dev',
'db_config': json.dumps(database_mapping),
'model_server_params': json.dumps(training_params.get('model_server_params', {}))
},
AlgorithmSpecification={
'TrainingImage': training_image,
'TrainingInputMode': 'File',
},
RoleArn=ROLE_ARN,
OutputDataConfig={
'S3OutputPath': S3_OUTPUT_PATH
},
ResourceConfig={
'InstanceType': INSTANCE_TYPE,
'InstanceCount': 1,
'VolumeSizeInGB': 20,
},
# VpcConfig={
# 'SecurityGroupIds': SECURITY_GROUPS.split(','),
# 'Subnets': SUBNETS.split(',')
# },
StoppingCondition={
'MaxRuntimeInSeconds': int(MAX_RUNTIME_SEC),
# 'MaxWaitTimeInSeconds': 1800
},
Tags=[ ],
EnableNetworkIsolation=False,
EnableInterContainerTrafficEncryption=False,
EnableManagedSpotTraining=False,
)
I have a logger inside the container that says that opt/ml/input/config/hyperparameters.json
now exists. It has been added by SageMaker. Fine.
But then, when I try to run the same container in serve
mode (so basically to deploy it) I encounter that opt/ml/input/config/hyperparameters.json
doesn't exist anymore. I deploy it this way:
sm.create_model(
ModelName=model_name,
PrimaryContainer={
'Image': training_image,
'ModelDataUrl': model_artifact,
'Environment': {
'version': version
}
},
ExecutionRoleArn=role_arn,
Tags=[ ],
# VpcConfig = {
# 'SecurityGroupIds': os.environ['security_groups'].split(','),
# 'Subnets': os.environ['subnets'].split(',')
# }
)
sm.create_endpoint_config(
EndpointConfigName=config_name,
ProductionVariants=[
{
'VariantName': variant_name,
'ModelName': model_name,
'InitialInstanceCount': instance_count,
'InstanceType': instance_type,
'InitialVariantWeight': 1
},
],
Tags=[ ],
)
existing_endpoints = sm.list_endpoints(NameContains=endpoint_name)
scaling_resource_id = f'endpoint/{endpoint_name}/variant/{variant_name}'
if not existing_endpoints['Endpoints']:
sm.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=config_name
)
else:
if aas.describe_scalable_targets(
ServiceNamespace='sagemaker',
ResourceIds=[scaling_resource_id],
ScalableDimension='sagemaker:variant:DesiredInstanceCount')['ScalableTargets']:
aas.deregister_scalable_target(
ServiceNamespace='sagemaker',
ResourceId=scaling_resource_id,
ScalableDimension='sagemaker:variant:DesiredInstanceCount'
)
sm.update_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=config_name
)
It is important since it seemed to be a convenient way to pass some parameters inside the container from outside (like management console).
I thought that this file/directory will still exist after the train. Any ideas?
Upvotes: 0
Views: 915
Reputation: 94
tl;dr: two options:
hyperparameters.json
file to /opt/ml/model
in the training logic and it will be packed with the model artifacts;PrimaryContainer
parameter's Environment
property.Long version:
That file, opt/ml/input/config/hyperparameters.json
, (in fact the whole /opt/ml/input
folder) is mounted on the training container when it is created. It is provided by SageMaker, based on information you provide, only for training purposes. SageMaker does not change your container in any way, and it doesn't preserve this or any configuration file it passes to the training job once training is done. If you want to pass parameters to the inference endpoint, that is not the way.
You could copy the hyperparameters.json
file to the /opt/ml/model
folder, and it'd be packed with the model in the model.tar.gz
tarball. Your infrence code could then use that - but that's not the prescribed way to pass parameters to an endpoint, and it cause problems with your framework.
The generally prescribed way to pass parameters to SageMaker endpoints is through the environment. If you check the boto3 docs for create_model, you'll see that there's an Environment
key within the PrimaryContainer
parameter (also for each of the Containers
parameter). In fact, your code above already uses that to pass a version
parameter. You should use that to pass any parameters to your model and, from there, to the endpoint based on it.
Upvotes: 1