Reputation: 1290
I want to write to a gcs bucket from dataproc using hudi.
To write to gcs using hudi it says to set prop fs.defaultFS to value gs:// (https://hudi.apache.org/docs/gcs_hoodie)
However when I set fs.defaultFS on dataproc to be a gcs bucket I get errors at startup relating to the job not being able to find my jar. It is looking in a gs:/ prefix, presumably because I have overridden defaultFs which it was previously using the find the jar. How would I fix this?
org.apache.spark.SparkException: Application application_1617963833977_0009 failed 2 times due to AM Container for appattempt_1617963833977_0009_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2021-04-12 15:36:05.142]java.io.FileNotFoundException: File not found : gs:/user/root/.sparkStaging/application_1617963833977_0009/myjar.jar
If it is relevant I am setting the defaultFs from within the code. sparkConfig.set("spark.hadoop.fs.defaultFS", gs://defaultFs)
Upvotes: 2
Views: 738
Reputation: 26458
You can try setting fs.defaultFS
to GCS when creating the cluster. For example:
gcloud dataproc clusters create ...\
--properties 'core:fs.defaultFS=gs://my-bucket'
Upvotes: 2