Jie Jason Li
Jie Jason Li

Reputation: 1

how to configure beam application with spark runner to use S3ACommitter?

I have a beam application and its running with spark runner. It encountered kind of data lost issue as this application save data to a S3 storage. I looked into this page https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/committers.html

it suggested to use S3A Committer for spark jobs. I followed the suggestion to add configuration to this beam job, but i don't know if it does used S3A Committerti save data. so j just wanna ask, what to configure this beam job to use this S3 committer? and how to prove it used S3A committer?

Upvotes: -1

Views: 36

Answers (1)

stevel
stevel

Reputation: 13470

s3a committers always create a json _SUCCESS file, with various statistics, whereas the older committer creates a zero byte _SUCCESS file. Looking for a _SUCCESS file > 0 bytes long is enough, parsing it as json even better.

However, I don't think beam works with it all. AFAIK nobody has ever tested it.

Upvotes: 0

Related Questions