Reuben Pereira
Reuben Pereira

Reputation: 11

How to specify a BigQuery table as an input source for the AutoMLImportDataOperator Airflow Operator?

As per the documentation available in the Airflow Docs here, GCS can be configured as an input source for the AutoMLImportDataOperator. However, I'm curious as to how a BQ table can be used since there is functionality for it within AutoML Tables itself. Any suggestions would be appreciated.

Upvotes: 1

Views: 177

Answers (2)

raj
raj

Reputation: 1213

AutoML Tables supports both BigQuery and GCS as sources. You can use BigQuery URI format to specify the location of your training data and it must conform to the following format: bq://<project_id>.<dataset_id>.<table_id>

In the Airflow DAG, you can use AutoMLImportDataOperator with input_config as below

IMPORT_INPUT_CONFIG = {"bigquery_source": {"input_uri": 'bq://{}.{}.{}'.format(project_id, bq_dataset, bq_table)}}

import_dataset_task = AutoMLImportDataOperator(
        task_id="import_dataset_task",
        dataset_id=dataset_id,
        location=GCP_AUTOML_LOCATION,
        input_config=IMPORT_INPUT_CONFIG,
)

You can refer to Airflow example DAG here for a more complete example with GCS as source. You have to update IMPORT_INPUT_CONFIG variable with BigQiuery URI.

Upvotes: 1

Federico Taranto
Federico Taranto

Reputation: 132

At the moment Airflow in GCP is still in prerelease [0], and as I understand there is no this functionality in Airflow operators [1] up to now. You may use the bigquery_to_gcs [2] operator to move your BQ data into GCS for later using the AutoMLImportDataOperator. Alternatively you can try to create your custom operator [3]

[0] https://cloud.google.com/automl-tables/docs/integrations

[1] https://airflow.readthedocs.io/en/latest/howto/operator/gcp/automl.html

[2] https://airflow.apache.org/docs/stable/_modules/airflow/contrib/operators/bigquery_to_gcs.html

[3] https://airflow.readthedocs.io/en/latest/howto/custom-operator.html

Upvotes: 0

Related Questions