Reputation: 913
Pyspark version 2.4.0
I'm writing files to an S3 I don't own. Then everyone is having trouble reading the file. I think the issue is similar to this How to assign the access control list (ACL) when writing a CSV file to AWS in pyspark (2.2.0)?
But the solution seems no longer working. Searched across Pyspark doc but didn't get an answer. I tried:
from pyspark.sql import SparkSession
spark = SparkSession.\
builder.\
master("yarn").\
appName(app_name).\
enableHiveSupport().\
getOrCreate()
spark.sparkContext.hadoopConfiguration.set("fs.s3a.acl.default", "BucketOwnerFullControl")
This is giving me: ERROR - {"exception": "'SparkContext' object has no attribute 'hadoopConfiguration'"
Upvotes: 0
Views: 2801
Reputation: 1972
There's two issues at hand.
getOrCreate()
your SparkSession
again with the new config
. You won't be able to just set
. For example:import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").getOrCreate()
sc = spark.sparkContext
conf = pyspark.SparkConf().setAll([('spark.executor.memory', '1g')])
# stop the sparkContext and set new conf
sc.stop()
spark = SparkSession.builder.config(conf=conf).getOrCreate()
spark.hadoop
. This means your config will become conf = pyspark.SparkConf().setAll([("spark.hadoop.fs.s3a.acl.default", "BucketOwnerFullControl")])
Hope this helps.
Upvotes: 1