Reading Config File in AWS Glue

Question

I have created a Glue Dev Endpoint to test my code before deploying to AWS Glue. Below, you will find a screen shot of the project architecture. Project layout in gluelibrary/ there is config.ini I am able to successfully debug the code and have it run to completion. The way that I am calling the library in the DEV environment looks like this:

Dev ENV

import sys
import os
import time
from configobj import ConfigObj
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3

config = ConfigObj('/home/glue/scripts/gluelibrary/config.ini')

This process successfully finds all of the variables that I defined in the config file and exits with an 'exit code 0'

Console

Note: the library that i developed was .zipped and added to the s3 bucket where I told the Glue Job to look for the .zip.

However, when I am in Glue Console, and I try to implement the same code (with the exception of the file path) I get an error:

import sys
import os
import time
from configobj import ConfigObj
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3

from gluelibrary.helpers import get_date
from gluelibrary import
from gluelibrary.boto3_.s3_utils import delete_data_in_sub_directories, check_for_empty_bucket
from gluelibrary.boto3_.s3_utils import replace_data_in_sub_directories, check_bucket_existence
print('starting job.')

print(os.getcwd())

config = ConfigObj('/home/glue/gluelibrary/config.ini')

--conf spark.hadoop.yarn.resourcemanager.connect.max-wait.ms=60000 --conf spark.hadoop.fs.defaultFS=hdfs://IP_ADDRESS.internal:8020 --conf spark.hadoop.yarn.resourcemanager.address=IP_ADDRESS.internal:8032 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.minExecutors=1 --conf spark.dynamicAllocation.maxExecutors=18 --conf spark.executor.memory=5g --conf spark.executor.cores=4 --JOB_ID j_26c2ab188a2d8b7567006809c549f5894333cd38f191f58ae1f2258475ed03d1 --enable-metrics --extra-py-files s3://BUCKET_NAME/Python/gluelibrary.zip --JOB_RUN_ID jr_0292d34a8b82dad6872f5ee0cae5b3e6d0b1fbc503dca8a62993ea0f3b38a2ae --scriptLocation s3://BUCKET_NAME/admin/JOB_NAME --job-bookmark-option job-bookmark-enable --job-language python --TempDir s3://BUCKET_NAME/admin --JOB_NAME JOB_NAME YARN_RM_DNS=IP_ADDRESS.internal Detected region us-east-2 JOB_NAME = JOB_NAME Specifying us-east-2 while copying script. Completed 6.6 KiB/6.6 KiB (70.9 KiB/s) with 1 file(s) remaining download: s3://BUCKET_NAME/admin/JOB_NAME to ./script_2018-10-12-14-57-20.py SCRIPT_URL = /tmp/g-6cad80fb460992d2c24a6f476b12275d2a9bc164-362894612904031505/script_2018-10-12-14-57-20.py

Reading Config File in AWS Glue

Dev ENV

Console

Answers (1)

Related Questions