Reputation: 1
I am trying to schedule a dataflow pipeline job to read content from a CloudSQL SQLServer instance and write it to the BigQuery table. I'm using the google.cloud.sql.connector[pytds] for setting connection. The manual dataflow job runs successfully when I run it through the Google cloud shell. The airflow version(using Google cloud composer) fails, giving Name error. 'NameError: name 'Connector' is not defined'
I have enabled the save-main-session option. Also, I have mentioned the connector module in the py_requirements option and it is being installed(as per the airflow logs). py_requirements=['apache-beam[gcp]==2.41.0','cloud-sql-python-connector[pytds]==0.6.1','pyodbc==4.0.34','SQLAlchemy==1.4.41','pymssql==2.2.5','sqlalchemy-pytds==0.3.4','pylint==2.15.4']
[2022-11-02 07:40:53,308] {process_utils.py:173} INFO - Collecting cloud-sql-python-connector[pytds]==0.6.1 [2022-11-02 07:40:53,333] {process_utils.py:173} INFO - Using cached cloud_sql_python_connector-0.6.1-py2.py3-none-any.whl (28 kB)
But it seems the import is not working.
Upvotes: 0
Views: 207
Reputation: 6572
You have to install the PyPi
packages in Cloud Composer
nodes, you have a tab
in the GUI
and Composer
page :
Add all the needed packages for your Dataflow
job in Composer
via this page, except Apache Beam
and Apache Beam GCP
because Beam
and Google Cloud
dependencies are already installed in Cloud Composer
.
Cloud Composer
is the runner of your Dataflow
job and the runner will instantiate the job. To be able to instantiate the job correctly, the runner needs to have the dependencies installed.
Then the Dataflow
job in execution mode, will use the given py_requirements
or setup.py
file in the workers.
py_requirements
or setup.py
must also contains the needed Packages
to execute the Dataflow
job.
Upvotes: 0