dko512
dko512

Reputation: 431

dbt and google cloud composer PyPI dependency issues

I am currently running Google Cloud Composer with a Composer version 2.0.9 and airflow version 2.1.4. I am trying install the most recent version of dbt (1.0.4 for core and 1.0.0 for the BigQuery plugin). Because cloud composter images has specific packages installed, I am getting conflicting PyPI dependency issues. When I try to fix one dependency another issue occurs. Does anyone know the specific set of packages installed that would resolve this issue? I have read the following posts by the community but I wanted to know if anyone has a solution for just using composer?

How to run DBT in airflow without copying our repo

How to set up dbt with Google Cloud Composer?

Upvotes: 3

Views: 1299

Answers (2)

dko512
dko512

Reputation: 431

As mentioned by @Kabilan Mohanraj, the current version of dbt (1.0.4) and a more recent version of Composer has dependency issues (Composer version 2.0.9 and Airflow version 2.1.4). Therefore an alternative solution is needed. In my case, I played around and searched for a solution from other people in the community and found one person using a certain version of Composer and dbt that only had mimimal dependency issues. However, as mentioned by @Kabilan Mohanraj, Google does not recommend downgrading preinstalled packages, so this would not be a viable solution for something in production.

create composer through gcloud to use an older version that is not available via the Composer UI

gcloud composer environments create my_airflow_dbt_example
--location us-central1
--image-version composer-1.17.9-airflow-2.1.4

requirements

dbt-bigquery==0.21.0
jsonschema==3.1.1
packaging==20.9

For this specific composer version, you are downgrading jsonschema from 3.2.0 to 3.1.1 and packaging from 21.3 to 20.9

Upvotes: 2

Kabilan Mohanraj
Kabilan Mohanraj

Reputation: 1906

I was able to reproduce the behaviour you are seeing. Below are the dependency conflicts I saw in the Cloud Build logs. These conflicts are occurring between the dbt-core requirements and the pre-installed package requirements in Composer.

Pre-installed package requirements:

hologram 0.0.14 has requirement jsonschema<3.2,>=3.0, but you have jsonschema 3.2.0. ##=> can be installed manually
flask 1.1.4 has requirement click<8.0,>=5.1, but you have click 8.1.2.
apache-airflow 2.1.4+composer has requirement markupsafe<2.0,>=1.1.1, but you have markupsafe 2.0.1.
looker-sdk 22.4.0 has requirement typing-extensions>=4.1.1, but you have typing-extensions 3.10.0.2.

dbt-core requirements:

hologram 0.0.14 has requirement jsonschema<3.2,>=3.0, but you have jsonschema 3.2.0. ##=> can be installed manually
dbt-core 1.0.4 has requirement click<9,>=8, but you have click 7.1.2.
dbt-core 1.0.4 has requirement MarkupSafe==2.0.1, but you have markupsafe 1.1.1.
dbt-core 1.0.4 has requirement typing-extensions<3.11,>=3.7.4, but you have typing-extensions 4.1.1.

I tried downgrading the pre-installed packages, but subsequent package installations fail and it is not recommended as well.

Therefore, I would suggest using an external solution as stated in this thread you have linked. Quoting the workarounds given in @Ryan Yuan's answer here.

  1. Using external services to run dbt jobs, e.g. Cloud Run.
  2. Using Composer's KubernetesPodOperator(updated Composer 2 link). My colleague has put up a nice article on dbt discourse here going through the setup process.
  3. Ignoring Composer's Dependency conflicts by setting Composer's environmental variable IGNORE_PYPI_DEPENDENCY_CONFLICTS to True. However, I don't recommend this as it may cause potential issues.
  4. Creating a Python virtual environment in Composer and install the dbt packages.

Upvotes: 3

Related Questions