Reputation: 1181
I am trying to write an RDD into S3 with server side encryption. Following is my piece of code.
val sparkConf = new SparkConf().
setMaster("local[*]").
setAppName("aws-encryption")
val sc = new SparkContext(sparkConf)
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", AWS_ACCESS_KEY)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", AWS_SECRET_KEY)
sc.hadoopConfiguration.setBoolean("fs.s3n.sse.enabled", true)
sc.hadoopConfiguration.set("fs.s3n.enableServerSideEncryption", "true")
sc.hadoopConfiguration.setBoolean("fs.s3n.enableServerSideEncryption", true)
sc.hadoopConfiguration.set("fs.s3n.sse", "SSE-KMS")
sc.hadoopConfiguration.set("fs.s3n.serverSideEncryptionAlgorithm", "SSE-KMS")
sc.hadoopConfiguration.set("fs.s3n.server-side-encryption-algorithm", "SSE-KMS")
sc.hadoopConfiguration.set("fs.s3n.sse.kms.keyId", KMS_ID)
sc.hadoopConfiguration.set("fs.s3n.serverSideEncryptionKey", KMS_ID)
val rdd = sc.parallelize(Seq("one", "two", "three", "four"))
rdd.saveAsTextFile(s"s3n://$bucket/$objKey")
This code is writing RDD on S3 but without encryption. [I have checked properties of the written object and it shows server-side encrypted is "no".] Am I skipping anything here or using any property incorrectly?
Any suggestion would be appreciated.
P.S. I have set same properties with different name, reason being I am not sure when to use which name for e.g.
sc.hadoopConfiguration.setBoolean("fs.s3n.sse.enabled", true)
sc.hadoopConfiguration.set("fs.s3n.enableServerSideEncryption", "true")
sc.hadoopConfiguration.setBoolean("fs.s3n.enableServerSideEncryption", true)
Thank you.
Upvotes: 3
Views: 8637
Reputation: 13490
Example policy
<property>
<name>fs.s3a.server-side-encryption-algorithm</name>
<value>AES256</value>
</property>
See Working with Encrypted Amazon S3 Data; these are the current (oct 2019) best docs on encrypting S3 with s3A & hadoop, spark & hive
AWS EMR readers: None of this applies to you. Switch to Apache Hadoop or look up the EMR docs.
Upvotes: 2