Reputation: 1
I have a beam application and its running with spark runner. It encountered kind of data lost issue as this application save data to a S3 storage. I looked into this page https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/committers.html
it suggested to use S3A Committer for spark jobs. I followed the suggestion to add configuration to this beam job, but i don't know if it does used S3A Committerti save data. so j just wanna ask, what to configure this beam job to use this S3 committer? and how to prove it used S3A committer?
Upvotes: -1
Views: 36
Reputation: 13470
s3a committers always create a json _SUCCESS file, with various statistics, whereas the older committer creates a zero byte _SUCCESS file. Looking for a _SUCCESS file > 0 bytes long is enough, parsing it as json even better.
However, I don't think beam works with it all. AFAIK nobody has ever tested it.
Upvotes: 0