Reputation: 31

Issues while reading and writing a KMS encrypted spark data-frame to a S3 bucket with pyspark

I am trying to write a Spark data-frame to AWS S3 bucket using Pyspark and getting an exceptions that the encryption method specified is not supported. The bucket has server-side encryption setup.

I'm having the following packages run from spark-default.conf: spark.jars.packages com.amazonaws:aws-java-sdk:1.9.5, org.apache.hadoop:hadoop-aws:3.2.0

Reviewed this existing thread: Doesn't Spark/Hadoop support SSE-KMS encryption on AWS S3 and it mentions that the above version should support SSE-KMS encryption.

I also included the core-site.xml to have the property 'fs.s3a.server-side-encryption-algorithm' set to 'SSE-KMS'

But, I still get the error. Please note that for buckets without the SSE-KMS, this works fine.

Error Message: AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Error Code: InvalidArgument, AWS Error Message: The encryption method specified is not supported

Upvotes: 2

Answers (2)

ny_09

Reputation: 31

Thanks for all your inputs Steve. Adding the following to the spark-defaults.conf fixed our issue:

spark.hadoop.fs.s3a.server-side-encryption-algorithm AES256

Upvotes: 1

stevel

Reputation: 13430

Hadoop 3.2.0 absolutely supports SSE-KMS, so whatever the problem is it'll be with: SSE-KMS key used in the config, your permissions to access it, or some other quirk (e.g. the key isn't in the same region as the bucket).

But: that release is built against AWS 1.11.375 mvnrepo hadoop-aws. Mixing JARs is generally doomed. That may be a factor, it may not.

You got a 400 back from the far end, meaning something was rejected there.

Recommend

You look at the troubleshooting s3a page
Download cloudstore and run its storediag to bootstrap connectivity diagnostics
try using the AWS CLI to work with data using the same setting

Note: it doesn't matter at all what the fs.s3a.encryption settings are when you are trying to read data -S3 knows the KMS key used and will automatically use it to decrypt, if you have the permissions. That's a good way to check you have read permissions on a key

Upvotes: 1

Issues while reading and writing a KMS encrypted spark data-frame to a S3 bucket with pyspark

Answers (2)

Related Questions