Mohamad Shaker
Mohamad Shaker

Reputation: 1486

Cannot write spark job output into s3 bucket directly

I have a Spark job which writes its results into s3 bucket, the thing is when the output bucket name looks like this s3a://bucket_name/ I get an error

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: xxx, AWS Error Code: NoSuchKey, AWS Error Message: null, S3 Extended Request ID: xxx

but when I add a subfolder inside the output bucket (s3a://bucket_name/subfolder/) it works!

I'm using hadoop-aws 2.7.3 to read from s3.

what is the problem?

Thanks in advance.

Upvotes: 1

Views: 668

Answers (1)

stevel
stevel

Reputation: 13430

Not a spark bug. Issue in how the S3 clients work with root directories. they are "special". HADOOP-13402 sort of looks at it. The code you have there is clearly from Amazon's own object store client, but it clearly behaves the same way.

To consider it differently: you wouldn't commit work to "file:///" or "hdfs:///"; everything expects a subdirectory.

Sorry.

Upvotes: 1

Related Questions