Reputation: 5343

Can a SSE:KMS Key ID be specified when writing to S3 in an AWS Glue Job?

If you follow the AWS Glue Add Job Wizard to create a script to write parquet files to S3 you end up with generated code something like this.

datasink4 = glueContext.write_dynamic_frame.from_options(
    frame=dropnullfields3,
    connection_type="s3",
    connection_options={"path": "s3://my-s3-bucket/datafile.parquet"},
    format="parquet",
    transformation_ctx="datasink4",
)

Is it possible to specify a KMS key so that the data is encrypted in the bucket?

Upvotes: 4

Answers (4)

Akshay Kabra

Reputation: 1

For python, the way I did it:

spark_session.sparkContext._jsc.hadoopConfiguration().set(
    "fs.s3.enableServerSideEncryption", "true"
)
spark_session.sparkContext._jsc.hadoopConfiguration().set(
    "fs.s3.serverSideEncryption.kms.keyId", getKMSKey()
)

Upvotes: 0

Matt Buckland

Reputation: 11

This isn't necessary. Perhaps it was when the question was first posed, but the same can be achieved by creating a security configuration and associating that with the glue job. Just remember to have this in your script, otherwise it won't do it:

job = Job(glueContext) 
job.init(args['JOB_NAME'], args)

https://docs.aws.amazon.com/glue/latest/dg/encryption-security-configuration.html https://docs.aws.amazon.com/glue/latest/dg/set-up-encryption.html

Upvotes: 1

LCC

Reputation: 948

To spell out the answer using PySpark, you can do either

from pyspark.conf import SparkConf
[...]
spark_conf = SparkConf().setAll([
  ("spark.hadoop.fs.s3.enableServerSideEncryption", "true"),
  ("spark.hadoop.fs.s3.serverSideEncryption.kms.keyId", "<Your Key ID>")
])
sc = SparkContext(conf=spark_conf)

noticing the spark.hadoop prefix - or (uglier but shorter)

sc._jsc.hadoopConfiguration().set("fs.s3.enableServerSideEncryption", "true")
sc._jsc.hadoopConfiguration().set("fs.s3.serverSideEncryption.kms.keyId", "<Your Key ID>")

where sc is your current SparkContext.

Upvotes: 4

Natalia

Reputation: 4532

glue scala job

val spark: SparkContext = new SparkContext()
val glueContext: GlueContext = new GlueContext(spark)
spark.hadoopConfiguration.set("fs.s3.enableServerSideEncryption", "true")
spark.hadoopConfiguration.set("fs.s3.serverSideEncryption.kms.keyId", args("ENCRYPTION_KEY"))

I think syntax should be differ for Python, but idea the same

Upvotes: 5

Can a SSE:KMS Key ID be specified when writing to S3 in an AWS Glue Job?

Answers (4)

Related Questions