Smaillns
Smaillns

Reputation: 3167

How to pass environment variables to AWS Glue

I'm using pyspark to write on a kafka broker, for that a JAAS security mechanism is set up thus we need to pass username and password as env variables

    data_frame \
        .selectExpr('CAST(id AS STRING) AS key', "to_json(struct(*)) AS value") \
        .write \
        .format('kafka') \
        .option('topic', topic)\
        .option('kafka.ssl.endpoint.identification.algorithm', 'https') \
        .option('kafka.bootstrap.servers', os.environ['BOOTSTRAP_SERVER']) \
        .option('kafka.sasl.jaas.config', 
                 f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{os.environ["USERNAME"]}" password="{os.environ["PASSWORD"]}";')\
        .option('kafka.sasl.mechanism', 'PLAIN')\
        .option('kafka.security.protocol', 'SASL_SSL')\
        .mode('append') \
        .save()

locally I used python os.environ[""] to retrieve environment variables, how to pass these last into AWS Glue Job ?

Upvotes: 7

Views: 12461

Answers (1)

Bob Haffner
Bob Haffner

Reputation: 8493

You could use Job Parameters

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv,
                          ['JOB_NAME',
                           'BOOTSTRAP_SERVER',
                           'USERNAME',
                           'PASSWORD'])


data_frame \
        .selectExpr('CAST(id AS STRING) AS key', "to_json(struct(*)) AS value") \
        .write \
        .format('kafka') \
        .option('topic', topic)\
        .option('kafka.ssl.endpoint.identification.algorithm', 'https') \
        .option('kafka.bootstrap.servers', args['BOOTSTRAP_SERVER']) \
        .option('kafka.sasl.jaas.config', 
                 f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{args['USERNAME']}" password="{args['PASSWORD']}";')\
        .option('kafka.sasl.mechanism', 'PLAIN')\
        .option('kafka.security.protocol', 'SASL_SSL')\
        .mode('append') \
        .save()

then you could pass BOOTSTRAP_SERVER, USERNAME and Password in the glue job console or perhaps in something like boto3

response = client.start_job_run(
             JobName = 'myGlueJob',
             Arguments = {
               '--BOOTSTRAP_SERVER': 'myServer',
               '--USERNAME': 'myUsername',
               '--PASSWORD': 'myPassword'})

Note: you should consider storing creds in something like AWS Secrets Manager and retrieve them in your glue script

Upvotes: 5

Related Questions