codninja0908
codninja0908

Reputation: 567

Airflow 2.0 support for DataprocClusterCreateOperator

In our project we are using DataprocClusterCreateOperator which was under contrib from airflow.contrib.operators import dataproc_operator. It is working fine with airflow version 1.10.14.

We are in a process of upgrading to Airflow 2.1.2 wherein while testing or dags which requires spinning of DataProc Cluster we found error as airflow.exceptions.AirflowException: Invalid arguments were passed to DataprocClusterCreateOperator (task_id: <task_id>). Invalid arguments were: **kwargs: {'config_bucket': None, 'autoscale_policy': None}

I am not able to see any links for this operator support in Airflow 2 so that I can identify the new params or the changes which happened. Please share the relevant link.

We are using google-cloud-composer version 1.17.2 having Airflow version 2.1.2.

Upvotes: 1

Views: 1158

Answers (2)

Kabilan Mohanraj
Kabilan Mohanraj

Reputation: 1906

The supported parameters for the DataprocCreateClusterOperator in Airflow 2 can be found here, in the source code. The cluster configuration parameters that can be passed to the operator can be found here.

The DataprocClusterCreateOperator has been renamed as DataprocCreateClusterOperator since January 13, 2020 as per this Github commit and has been ported from airflow.contrib.operators to airflow.providers.google.cloud.operators.dataproc import path.

As given in @itroulli's answer, an example implementation of the operator can be found here.

Upvotes: 1

itroulli
itroulli

Reputation: 2094

Since Airflow 2.0, 3rd party provider (like Google in this case) operators/hooks has been moved away from Airflow core to separate providers packages. You can read more here.

Since you are using Cloud Composer, the Google providers package is already installed.

Regarding the DataprocClusterCreateOperator, it has been renamed to DataprocCreateClusterOperator and moved to airflow.providers.google.cloud.operators.dataproc so you can import it with:

from airflow.providers.google.cloud.operators.dataproc import DataprocCreateClusterOperator

The accepted parameters differ from the one included in Airflow 1.x. You can find an example of usage here.

Upvotes: 1

Related Questions