Aswin Murugesh
Aswin Murugesh

Reputation: 11080

Spark S3 Write - Getting Access Denied Error when writing to a bucket

I am trying to read and write files from an S3 bucket. I created an IAM user in my AWS portal. I have configured aws cli in my EMR instance with the same keys and from the cli I am able to read and write files into a specific S3 bucket.

But when I try the same from inside my spark shell, I am able to read the file from the bucket, but when I try to write the same file into a different path in the same bucket, I get AccessDenied error. This is the set of commands I execute:

sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "awsAccessKeyId")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "awsSecretAccessKey")
val a = spark.read.parquet("s3://path.parquet")
a.write.parquet("s3://path.parquet")

Here's the error message

Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ; S3 Extended Request ID: , S3 Extended Request ID: 
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4914)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4860)
    at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.initiateMultipartUpload(AmazonS3Client.java:3552)
    at com.amazon.ws.emr.hadoop.fs.s3.lite.call.InitiateMultipartUploadCall.perform(InitiateMultipartUploadCall.java:22)
    at com.amazon.ws.emr.hadoop.fs.s3.lite.call.InitiateMultipartUploadCall.perform(InitiateMultipartUploadCall.java:8)
    at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:91)
    at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:184)
    at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.initiateMultipartUpload(AmazonS3LiteClient.java:145)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
    at com.sun.proxy.$Proxy32.initiateMultipartUpload(Unknown Source)
    at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.ensureMultipartUploadIsInitiated(MultipartUploadOutputStream.java:541)
    at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadSinglePartWithMultipartUpload(MultipartUploadOutputStream.java:399)
    at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.doMultiPartUpload(MultipartUploadOutputStream.java:436)
    ... 24 more

Thanks in advance.

Upvotes: 1

Views: 3306

Answers (1)

Mark
Mark

Reputation: 66

Check your IAM permissions. If you have a custom-named IAM role, make sure it uses iam:PassRole and check for typos in your Role name. arn:aws:iam::123456789012:role/YourName.

See: AWS Docs

Upvotes: 2

Related Questions