Reputation: 1032
I have a Spark process that takes two input files from S3. At the end of the job, I simply want to write the results back into S3 with saveAsTextFile
method. However, I am getting Access Denied
errors.
My policy rule is wide open to make sure I don't have any permission errors:
{
"Version": "2012-10-17",
"Id": "Policy1457106962648",
"Statement": [
{
"Sid": "Stmt1457106959104",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::<bucket-name>/*"
}
]
}
I set my credentials on SparkContext
like the following:
SparkConf conf = new SparkConf()
.setAppName("GraphAnalyser")
.setMaster("local[*]")
.set("spark.driver.memory", "2G");
.set("spark.hadoop.fs.s3.awsAccessKeyId", [access-key])
.set("spark.hadoop.fs.s3n.awsAccessKeyId", [access-key])
.set("spark.hadoop.fs.s3.awsSecretAccessKey", [secret-key])
.set("spark.hadoop.fs.s3n.awsSecretAccessKey", [secret-key]);
And I use pass file URLs with the s3n
protocol:
final String SC_NODES_FILE = "s3n://" + BUCKET_NAME + "/" + NODES_FILE;
final String SC_EDGES_FILE = "s3n://" + BUCKET_NAME + "/" + EDGES_FILE;
final String SC_OUTPUT_FILE = "s3n://" + BUCKET_NAME + "/output";
Note that I have no trouble with accessing the input files. It seems like Spark sends a HEAD
request for the output file, to make sure it does not exist before it attempts to save final results. Since, s3 returns Access Denied
instead of Not Found
. That is probably the reason why Spark throws an Exception and exits.
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/output.csv' - ResponseCode=403, ResponseMessage=Forbidden
Spark 1.6.0 aws-java-sdk (1.10.58) spark-core_2.10 (1.6.0)
Your help is appreciated. Thank you very much.
Upvotes: 1
Views: 1417
Reputation: 1032
answering my own question
It turns out that I needed the s3:ListBucket
action, which is only applicable when resource is the bucket itself, not the keys inside the bucket.
In my original policy file I had the following resource:
"Resource": "arn:aws:s3:::<bucket-name>/*"
I had to add:
"Resource": "arn:aws:s3:::<bucket-name>/*"
Here's my final policy file that works for me:
{
"Id": "Policy145712123124123",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt145712812312323",
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<bucket-name>",
"arn:aws:s3:::<bucket-name>/*"
],
"Principal": {
"AWS": [
"arn:aws:iam::<account-id>:user/<user-name>"
]
}
}
]
}
Upvotes: 3