Parvathy Menon
Parvathy Menon

Reputation: 1

Unable to use the google.cloud.sql.connector module in Google composer

I am trying to schedule a dataflow pipeline job to read content from a CloudSQL SQLServer instance and write it to the BigQuery table. I'm using the google.cloud.sql.connector[pytds] for setting connection. The manual dataflow job runs successfully when I run it through the Google cloud shell. The airflow version(using Google cloud composer) fails, giving Name error. 'NameError: name 'Connector' is not defined'

I have enabled the save-main-session option. Also, I have mentioned the connector module in the py_requirements option and it is being installed(as per the airflow logs). py_requirements=['apache-beam[gcp]==2.41.0','cloud-sql-python-connector[pytds]==0.6.1','pyodbc==4.0.34','SQLAlchemy==1.4.41','pymssql==2.2.5','sqlalchemy-pytds==0.3.4','pylint==2.15.4']

[2022-11-02 07:40:53,308] {process_utils.py:173} INFO - Collecting cloud-sql-python-connector[pytds]==0.6.1 [2022-11-02 07:40:53,333] {process_utils.py:173} INFO - Using cached cloud_sql_python_connector-0.6.1-py2.py3-none-any.whl (28 kB)

But it seems the import is not working.

Upvotes: 0

Views: 207

Answers (1)

Mazlum Tosun
Mazlum Tosun

Reputation: 6572

You have to install the PyPi packages in Cloud Composer nodes, you have a tab in the GUI and Composer page :

enter image description here

Add all the needed packages for your Dataflow job in Composer via this page, except Apache Beam and Apache Beam GCP because Beam and Google Cloud dependencies are already installed in Cloud Composer.

Cloud Composer is the runner of your Dataflow job and the runner will instantiate the job. To be able to instantiate the job correctly, the runner needs to have the dependencies installed.

Then the Dataflow job in execution mode, will use the given py_requirements or setup.py file in the workers.

py_requirements or setup.py must also contains the needed Packages to execute the Dataflow job.

Upvotes: 0

Related Questions