ky2ninh
ky2ninh

Reputation: 61

AWS Glue Job getting Access Denied when writing to S3

I have a Glue ETL job, created by CloudFormation. This job extracts data from RDS Aurora and write to S3.

When I run this job, I get the error below.

The job has an IAM service role.

This service role allows

  1. Glue and RDS service,
  2. assume arn:aws:iam::aws:policy/AmazonS3FullAccess and arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole, and
  3. has full range of rds:* , kms:* , and s3:* actions allow to the corresponding RDS, KMS, and S3 resources.

I have the same error whether the S3 bucket is encrypted with either AES256 or aws:kms.

I get the same error whether the job has a Security Configuration or not.

I have a job doing the exactly same thing that I created manually and can run successfully without a Security Configuration.

What am I missing? Here's the full error log

"/mnt/yarn/usercache/root/appcache/application_1...5_0002/container_15...45_0002_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o145.pyWriteDynamicFrame. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 2.0 failed 4 times, most recent failure: Lost task 3.3 in stage 2.0 (TID 30, ip-10-....us-west-2.compute.internal, executor 1): com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: F...49), S3 Extended Request ID: eo...wXZw= at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588

Upvotes: 6

Views: 24989

Answers (7)

Rodrigo Takeuti
Rodrigo Takeuti

Reputation: 21

I had the same issue and in my case, my Glue Job Security Configuration was misconfigured (I was using an S3 KMS Key without access to the bucket in Lake Formation account). After I corrected it, the job run perfectly!

prints of correct Security Configuration and related KMS policy

Upvotes: 0

Ashish Rai
Ashish Rai

Reputation: 21

You should add a Security configurations(mentioned under Secuity tab on Glue Console). providing S3 Encryption mode either SSE-KMS or SSE-S3.

Security Configuration

1

Now select the above security configuration while creating your job under Advance Properties.

Duly verify you IAM role & S3 bucket policy. It will work

Upvotes: 2

Galdil
Galdil

Reputation: 589

Make sure you have given the right policies. I was facing the same issue, thought I had the role configured well. But after I erased the role and followed this step, it worked ;]

Upvotes: 0

Deepak Sood
Deepak Sood

Reputation: 405

For me it was two things.

  1. Access policy for a bucket should be given correctly - bucket/*, here I was missing the * part
  2. Endpoint in VPC must be created for glue to access S3 https://docs.aws.amazon.com/glue/latest/dg/vpc-endpoints-s3.html

After these two settings, my glue job ran successfully. Hope this helps.

Upvotes: 0

Sandeep Fatangare
Sandeep Fatangare

Reputation: 2144

How are you providing permission for PassRole to glue role?

{
        "Sid": "AllowAccessToRoleOnly",
        "Effect": "Allow",
        "Action": [
          "iam:PassRole",
          "iam:GetRole",
          "iam:GetRolePolicy",
          "iam:ListRolePolicies",
          "iam:ListAttachedRolePolicies"
        ],
        "Resource": "arn:aws:iam::*:role/<role>"
      }

Usually we create roles using <project>-<role>-<env> e.g. xyz-glue-dev where project name is xyz and env is dev. In that case we use "Resource": "arn:aws:iam::*:role/xyz-*-dev"

Upvotes: 0

David
David

Reputation: 321

In addition to Lydon's answer, error 403 is also received if your Data Source location is the same as the Data Target; defined when creating a Job in Glue. Change either of these if they are identical and the issue will be resolved.

Upvotes: 1

Lydon
Lydon

Reputation: 329

Unfortunately the error doesn't tell us much except that it's failing during the write of your DynamicFrame.

There is only a handful of possible reasons for the 403, you can check if you have met them all:

  1. Bucket Policy rules on the destination bucket.
  2. The IAM Role needs permissions (although you mention having S3*)
  3. If this is cross-account, then there is more to check with regards things like to allow-policies on the bucket and user. (In general a Trust for the Canonical Account ID is simplest)
  4. I don't know how complicated your policy documents might be for the Role and Bucket, but remember that an explicit Deny statement takes precedence over an allow.
  5. If the issue is KMS related, I would check to ensure your Subnet you select for the Glue Connection has a route to reach the KMS endpoints (You can add an Endpoint for KMS in VPC)
  6. Make sure issue is not with the Temporary Directory that is also configured for your job or perhaps write-operations that are not your final.
  7. Check that your account is the "object owner" of the location you are writing to (normally an issue when read/writing data between accounts)

If none of the above works, you can shed some more light with regards to your setup. Perhaps the code for write-operation.

Upvotes: 5

Related Questions