Reputation: 567
I have the following line of code
val datasink3 = glueContext
.getSinkWithFormat(
connectionType = "s3",
options = JsonOptions(Map("path" -> outputPath)),
format = "parquet",
transformationContext = "datasink3")
.writeDynamicFrame(repartitionedDataSource3)
This write fails with
Exception in User Class: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception : Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 9K7H4CDMRM3AM51H; S3 Extended Request ID: DgRwQ8tvq2FjlmVJ4GkBjYW5xwN8lMYtoStvpe8zRr+bSx0pwcybYDSuZYXXJN0pF1pWHiziuAI=)
However, if I switch the write to
val datasink3 = glueContext
.getSinkWithFormat(
connectionType = "s3",
options = JsonOptions(Map("path" -> outputPath)),
format = "csv",
transformationContext = "datasink3")
.writeDynamicFrame(repartitionedDataSource3)
It works! What the hell!
The IAM policy has the following perms, none of the resource-level permissions restrict on filetype
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
]
Any ideas? This is weird as hell
Upvotes: 1
Views: 334
Reputation: 567
Here's the issue. I had permissioned the role to only have access to certain folders, i.e.
bucket/toplevelfolder/subfolder*
Glue uses Spark as its ETL engine under the hood. Hence, the Glue job attempted to create a Spark placeholder object named "toplevelfolder%24folder%24" in the path "s3://bucket/" (prior to writing into the actual destination), over which the Role does not have access.
By simply adding S3 permissions on this specific path "s3://bucket/*", the role was able to write the necessary placeholder objects before accessing the prefix into which Spark (glue job) outputted the data.
This only occurs with parquet files because when we write Parquet, by default it creates temporary folders with s3/s3n. This is due to EMRFS implementation mentioned in
https://aws.amazon.com/premiumsupport/knowledge-center/emr-s3-empty-files/
Upvotes: 2