J  Calbreath
J Calbreath

Reputation: 2705

Spark credential chain ordering - S3 Exception Forbidden

I'm running Spark 2.4 on an EC2 instance. I am assuming an IAM role and setting the key/secret key/token in the sparkSession.sparkContext.hadoopConfiguration, along with the credentials provider as "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider".

When I try to read a dataset from s3 (using s3a, which is also set in the hadoop config), I get an error that says

com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 7376FE009AD36330, AWS Error Code: null, AWS Error Message: Forbidden

read command:

val myData = sparkSession.read.parquet("s3a://myBucket/myKey")

I've repeatedly checked the S3 path and it's correct. My assumed IAM role has the right privileges on the S3 bucket. The only thing I can figure at this point is that spark has some sort of hidden credential chain ordering and even though I have set the credentials in the hadoop config, it is still grabbing credentials from somewhere else (my instance profile???). But I have no way to diagnose that.

Any help is appreciated. Happy to provide any more details.

Upvotes: 2

Views: 1014

Answers (1)

stevel
stevel

Reputation: 13430

  1. spark-submit will pick up your env vars and set them as the fs.s3a access +secret + session key, overwriting any you've already set.
  2. If you only want to use the IAM credentials, just set fs.s3a.aws.credentials.provider to com.amazonaws.auth.InstanceProfileCredentialsProvider; it'll be the only one used

Further Reading: Troubleshooting S3A

Upvotes: 1

Related Questions