Reputation: 3167
I'm using pyspark
to write on a kafka
broker, for that a JAAS security mechanism is set up thus we need to pass username and password as env variables
data_frame \
.selectExpr('CAST(id AS STRING) AS key', "to_json(struct(*)) AS value") \
.write \
.format('kafka') \
.option('topic', topic)\
.option('kafka.ssl.endpoint.identification.algorithm', 'https') \
.option('kafka.bootstrap.servers', os.environ['BOOTSTRAP_SERVER']) \
.option('kafka.sasl.jaas.config',
f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{os.environ["USERNAME"]}" password="{os.environ["PASSWORD"]}";')\
.option('kafka.sasl.mechanism', 'PLAIN')\
.option('kafka.security.protocol', 'SASL_SSL')\
.mode('append') \
.save()
locally I used python
os.environ[""]
to retrieve environment variables, how to pass these last into AWS Glue Job ?
Upvotes: 7
Views: 12461
Reputation: 8493
You could use Job Parameters
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv,
['JOB_NAME',
'BOOTSTRAP_SERVER',
'USERNAME',
'PASSWORD'])
data_frame \
.selectExpr('CAST(id AS STRING) AS key', "to_json(struct(*)) AS value") \
.write \
.format('kafka') \
.option('topic', topic)\
.option('kafka.ssl.endpoint.identification.algorithm', 'https') \
.option('kafka.bootstrap.servers', args['BOOTSTRAP_SERVER']) \
.option('kafka.sasl.jaas.config',
f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{args['USERNAME']}" password="{args['PASSWORD']}";')\
.option('kafka.sasl.mechanism', 'PLAIN')\
.option('kafka.security.protocol', 'SASL_SSL')\
.mode('append') \
.save()
then you could pass BOOTSTRAP_SERVER, USERNAME and Password in the glue job console or perhaps in something like boto3
response = client.start_job_run(
JobName = 'myGlueJob',
Arguments = {
'--BOOTSTRAP_SERVER': 'myServer',
'--USERNAME': 'myUsername',
'--PASSWORD': 'myPassword'})
Note: you should consider storing creds in something like AWS Secrets Manager and retrieve them in your glue script
Upvotes: 5