Reputation: 343
I have several Spark jobs that write data to and read data from S3. Occasionally (about once per week for approximately 3 hours), the Spark jobs will fail with the following exception:
org.apache.spark.sql.AnalysisException: Path does not exist.
I've uncovered that this is likely due to the consistency model in S3, where list operations are eventually consistent. S3 Guard claims to solve this issue, but I'm in a Spark environment that doesn't support that utility.
Has anyone else run into this issue and figured out a reasonable approach for dealing with it?
Upvotes: 1
Views: 1893
Reputation: 13430
Otherwise: don't use S3 as your direct destination of work.
Upvotes: 1