Reputation: 991
We create a python shell job which is connecting Redshift and fetching data, below program is working fine in my local system. Below are the steps and programs.
Program:-
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "#####"
USER = "#####"
PASSWORD = "#####"
HOST = "#####.redshift.amazonaws.com"
PORT = "5439"
SCHEMA = "test" #default is "public"
####### connection and session creation ##############
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###### All Set Session created using provided schema #######
################ write queries from here ######################
query = "SELECT * FROM test1 limit 2;"
rr = s.execute(query)
all_results = rr.fetchall()
def pretty(all_results):
for row in all_results :
print("row start >>>>>>>>>>>>>>>>>>>>")
for r in row :
print(" ----" , r)
print("row end >>>>>>>>>>>>>>>>>>>>>>")
pretty(all_results)
########## close session in the end ###############
s.close()
Steps:-
I have uploaded the files psycopg2-2.8.4-cp27-cp27m-win32.whl, Flask_SQLAlchemy-2.4.1-py2.py3-none-any.whl and sqlalchemy_redshift-0.7.5-py2.py3-none-any.whl in S3 (s3://####/lib/), and map the folder in Python library path in AWS Glue Job.
When I run the program below error is occurring.
Traceback (most recent call last):
File "/tmp/runscript.py", line 113, in <module>
download_and_install(args.extra_py_files)
File "/tmp/runscript.py", line 56, in download_and_install
download_from_s3(s3_file_path, local_file_path)
File "/tmp/runscript.py", line 81, in download_from_s3
s3.download_file(bucket_name, s3_key, new_file_path)
File "/usr/local/lib/python2.7/site-packages/boto3/s3/inject.py", line 172, in download_file
extra_args=ExtraArgs, callback=Callback)
File "/usr/local/lib/python2.7/site-packages/boto3/s3/transfer.py", line 307, in download_file
future.result()
File "/usr/local/lib/python2.7/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/usr/local/lib/python2.7/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
PS:- The Glue Job Role has full access to S3.
Please suggest how to map those libraries with the program.
Upvotes: 0
Views: 4781
Reputation: 17
There is a simple way to import python dependencies using whl files, that can be find on Python site for particular module.
You can also add multiple wheel files from S3 using comma.
For eg "s3://xxxxxxxxx/common/glue/glue_whl/fastparquet-0.4.1-cp37-cp37m-macosx_10_9_x86_64.whl,s3://xxxxxx/common/glue/glue_whl/packaging-20.4-py2.py3-none-any.whl,s3://xxxxxx/common/glue/glue_whl/s3fs-0.5.0-py3-none-any.whl"
Upvotes: 0
Reputation: 2224
You can specify your own Python libraries packaged as an .egg or a .whl file under the "—extra-py-files" flag as shown in below example.
Command line example :
aws glue create-job --name python-redshift-test-cli --role role --command '{"Name" : "pythonshell", "ScriptLocation" : "s3://MyBucket/python/library/redshift_test.py"}'
--connections Connections=connection-name --default-arguments '{"--extra-py-files" : ["s3://MyBucket/python/library/redshift_module-0.1-py2.7.egg", "s3://MyBucket/python/library/redshift_module-0.1-py2.7-none-any.whl"]}'
Refernece : Create a glue job with extra python library
Upvotes: 1