Reputation: 277
I want to check if a Google storage bucket exists using spark-scala. If it doesn't exist, create it.
Can somebody help?
Upvotes: 2
Views: 4264
Reputation: 11
You can check if a given google cloud storage exists, then if it does not exist create an empty dataframe and overwrite the path plus a folder so spark creates a bucket with a folder inside and then make spark delete the folder.
I made it with PySpark but you can easily translate it to Scala with the help of the following question.
p = spark._jvm.org.apache.hadoop.fs.Path(path)
fs = p.getFileSystem(spark._jsc.hadoopConfiguration())
if fs.exists(p) == False:
df = #Create an empty dataframe
df.write.mode("overwrite").parquet("gs://...../folder")
fs.delete("gs://..../folder", True)
Upvotes: 0
Reputation: 1520
To access Google Cloud storage bucket, use Google Cloud Client libraries:
Please note, that in order to read/write to the bucket, there is a need to set proper permissions for the storage
Upvotes: 1