Karan Alang
Karan Alang

Reputation: 1063

GCP Composer Airflow - unable to install packages using PyPi

I have created a Composer environment with image version -> composer-2.0.13-airflow-2.2.5

when i try to install software using PyPi, it fails. details below :

Command :
gcloud composer environments update $AIRFLOW     --location us-east1      --update-pypi-packages-from-file requirements.txt


requirement.txt
---------------
google-api-core
google-auth
google-auth-oauthlib
google-cloud-bigquery
google-cloud-core
google-cloud-storage
google-crc32c
google-resumable-media
googleapis-common-protos
google-endpoints
joblib
json5
jsonschema
pandas
requests
requests-oauthlib


Error :
Karans-MacBook-Pro:composer_dags karanalang$ gcloud composer environments update $AIRFLOW     --location us-east1      --update-pypi-packages-from-file requirements.txt
Waiting for [projects/versa-sml-googl/locations/us-east1/environments/versa-composer3] to be updated with [projects/versa-sml-googl/locations/us-east1/operations/c23b77a9-f46b-4222-bafd-62527bf27239]..
.failed.                                                                                                                                                                                                 
ERROR: (gcloud.composer.environments.update) Error updating [projects/versa-sml-googl/locations/us-east1/environments/versa-composer3]: Operation [projects/versa-sml-googl/locations/us-east1/operations/c23b77a9-f46b-4222-bafd-62527bf27239] failed: Failed to install PyPI packages. looker-sdk 22.4.0 has requirement attrs>=20.1.0; python_version >= "3.7", but you have attrs 17.4.0.
 Check the Cloud Build log at https://console.cloud.google.com/cloud-build/builds/60ac972a-8f5e-4b4f-a4a7-d81049fb19a3?project=939354532596 for details. For detailed instructions see https://cloud.google.com/composer/docs/troubleshooting-package-installation


Pls note: I have an older Composer cluster (Composer version - 1.16.8, Airflow version - 1.10.15), where the above command works fine. However, it is not working with the new cluster

What needs to be done to debug/fix this ?

tia!

Upvotes: 0

Views: 960

Answers (2)

Karan Alang
Karan Alang

Reputation: 1063

I was able to get this working using the following code :

path = "gs://dataproc-spark-configs/pip_install.sh"

CLUSTER_GENERATOR_CONFIG = ClusterGenerator(
    project_id=PROJECT_ID,
    zone="us-east1-b",
    master_machine_type="n1-standard-4",
    worker_machine_type="n1-standard-4",
    num_workers=4,
    storage_bucket="dataproc-spark-logs",
    init_actions_uris=[path],
    metadata={'PIP_PACKAGES': 'pyyaml requests pandas openpyxl kafka-python'},
).make()


with models.DAG(
    'Versa-Alarm-Insights-UsingComposer2',
        # Continue to run DAG twice per day
        default_args=default_dag_args,
        schedule_interval='0 0/12 * * *',
        catchup=False,
        ) as dag: 

        create_dataproc_cluster = DataprocCreateClusterOperator(
          task_id="create_dataproc_cluster",  
          cluster_name="versa-composer2",
          region=REGION,
          cluster_config=CLUSTER_GENERATOR_CONFIG
     )
     

The earlier command which involved installing packages by reading from file was working in Composer1 (Airflow 1.x), however failing with Composer 2.x (Airflow 2.x)

Upvotes: 1

Vishal Bulbule
Vishal Bulbule

Reputation: 309

From the error, it is clear that you are running old version of attrs package.

run the below command and try

pip install attrs==20.3.0

or

pip install attrs==20.1.0

Upvotes: 0

Related Questions