Reputation: 11
As per the documentation available in the Airflow Docs here, GCS can be configured as an input source for the AutoMLImportDataOperator. However, I'm curious as to how a BQ table can be used since there is functionality for it within AutoML Tables itself. Any suggestions would be appreciated.
Upvotes: 1
Views: 177
Reputation: 1213
AutoML Tables supports both BigQuery and GCS as sources. You can use BigQuery URI format to specify the location of your training data and it must conform to the following format: bq://<project_id>.<dataset_id>.<table_id>
In the Airflow DAG, you can use AutoMLImportDataOperator
with input_config
as below
IMPORT_INPUT_CONFIG = {"bigquery_source": {"input_uri": 'bq://{}.{}.{}'.format(project_id, bq_dataset, bq_table)}}
import_dataset_task = AutoMLImportDataOperator(
task_id="import_dataset_task",
dataset_id=dataset_id,
location=GCP_AUTOML_LOCATION,
input_config=IMPORT_INPUT_CONFIG,
)
You can refer to Airflow example DAG here for a more complete example with GCS as source. You have to update IMPORT_INPUT_CONFIG
variable with BigQiuery URI.
Upvotes: 1
Reputation: 132
At the moment Airflow in GCP is still in prerelease [0], and as I understand there is no this functionality in Airflow operators [1] up to now. You may use the bigquery_to_gcs [2] operator to move your BQ data into GCS for later using the AutoMLImportDataOperator. Alternatively you can try to create your custom operator [3]
[0] https://cloud.google.com/automl-tables/docs/integrations
[1] https://airflow.readthedocs.io/en/latest/howto/operator/gcp/automl.html
[2] https://airflow.apache.org/docs/stable/_modules/airflow/contrib/operators/bigquery_to_gcs.html
[3] https://airflow.readthedocs.io/en/latest/howto/custom-operator.html
Upvotes: 0