Sow
Sow

Reputation: 71

AWS Error Message: Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4

I am facing the following error while writing to S3 bucket using pyspark.

com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: A0B0C0000000DEF0, AWS Error Code: InvalidArgument, AWS Error Message: Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.,

I have applied server-side encryption using AWS KMS service on the S3 bucket. I am using the following spark-submit command -

spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.7.2 --jars sample-jar sample_pyspark.py 

This is the sample code I am working on -

spark_context = SparkContext()
sql_context = SQLContext(spark_context) 
spark = SparkSession.builder.appName('abc').getOrCreate()
hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
#Have a spark dataframe 'source_data
source_data.coalesce(1).write.mode('overwrite').parquet("s3a://sample-bucket")

Note: Tried to load the spark-dataframe into s3 bucket [without server-side encryption enabled] and it was successful

Upvotes: 5

Views: 9412

Answers (1)

choy
choy

Reputation: 426

The error seems to be telling you to enable V4 S3 signatures on the Amazon SDK. One way to do it is from the command line:

spark-submit --conf spark.driver.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4' \
    --conf spark.executor.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4' \
    ... (other spark options)

That said, I agree with Steve that you should use a more recent hadoop library.

References:

Upvotes: 2

Related Questions