Francesco Camussoni
Francesco Camussoni

Reputation: 11

ClientError: Failed to download data. Please check your s3 objects and ensure that there is no object that is both a folder as well as a file

How are you?

I'm trying to execute a sagemaker job but i get this error:

ClientError: Failed to download data. Cannot download s3://pocaaml/sagemaker/xsell_sc1_test/model/model_lgb.tar.gz, a previously downloaded file/folder clashes with it. Please check your s3 objects and ensure that there is no object that is both a folder as well as a file.

I'm have that model_lgb.tar.gz on that s3 path as you can see here:

s3 bucket with model in it

This is my code:

project_name = 'xsell_sc1_test'
s3_bucket = "pocaaml"
prefix = "sagemaker/"+project_name
account_id = "029294541817"
s3_bucket_base_uri = "{}{}".format("s3://", s3_bucket)
dev = "dev-{}".format(strftime("%y-%m-%d-%H-%M", gmtime()))

region = sagemaker.Session().boto_region_name
print("Using AWS Region: {}".format(region))

# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()

boto3.setup_default_session(region_name=region)

boto_session = boto3.Session(region_name=region)

s3_client = boto3.client("s3", region_name=region)

sagemaker_boto_client = boto_session.client("sagemaker") #este pinta?

sagemaker_session = sagemaker.session.Session(
    boto_session=boto_session, sagemaker_client=sagemaker_boto_client
)

sklearn_processor = SKLearnProcessor(
    framework_version="0.23-1", role=role, instance_type='ml.m5.4xlarge', instance_count=1
)

PREPROCESSING_SCRIPT_LOCATION = 'funciones_altas.py'

preprocessing_input_code = sagemaker_session.upload_data(
    PREPROCESSING_SCRIPT_LOCATION,
    bucket=s3_bucket,
    key_prefix="{}/{}".format(prefix, "code")
)

preprocessing_input_data = "{}/{}/{}".format(s3_bucket_base_uri, prefix, "data")
preprocessing_input_model = "{}/{}/{}".format(s3_bucket_base_uri, prefix, "model")
preprocessing_output = "{}/{}/{}/{}/{}".format(s3_bucket_base_uri, prefix, dev, "preprocessing" ,"output")

processing_job_name = params["project_name"].replace("_", "-")+"-preprocess-{}".format(strftime("%d-%H-%M-%S", gmtime()))

sklearn_processor.run(
    code=preprocessing_input_code,
    job_name = processing_job_name,
    inputs=[ProcessingInput(input_name="data",
                            source=preprocessing_input_data, 
                            destination="/opt/ml/processing/input/data"),
           ProcessingInput(input_name="model",
                           source=preprocessing_input_model, 
                           destination="/opt/ml/processing/input/model")],
    outputs=[
        ProcessingOutput(output_name="output", 
                         destination=preprocessing_output,
                         source="/opt/ml/processing/output")],
    wait=False,
)

preprocessing_job_description = sklearn_processor.jobs[-1].describe()

and on funciones_altas.py i'm using ohe_altas.tar.gz and not model_lgb.tar.gz making this error super weird.

can you help me?

Upvotes: 0

Views: 1565

Answers (1)

Fatema Alkhanaizi
Fatema Alkhanaizi

Reputation: 36

Looks like you are using sagemaker generated execution role and the error is related to S3 permissions.

Here are a couple of things you can do:

  1. make sure to check the policies on the role that they have access to your bucket.
  2. check if the objects are encrypted in your bucket, if so then ensure to also include kms policy to the role you are linking to the job. https://aws.amazon.com/premiumsupport/knowledge-center/s3-403-forbidden-error/

You can always create your own role as well and pass the arn to the code to run the processing job.

Upvotes: 1

Related Questions