Alen T Mathew
Alen T Mathew

Reputation: 110

Using Google Composer to run Bigquery query

I am new To Google composer and Apache airflow.

I am trying to query Bigquery by creating a DAG.

import datetime

import airflow
from airflow.operators import bash_operator


from airflow.contrib.operators import bigquery_operator


YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)

default_args = {
    'owner': 'me',
    'depends_on_past': False,
    'email': [''],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': datetime.timedelta(minutes=5),
    'start_date': YESTERDAY,
}

with airflow.DAG(
        'composer_test_dag',
        'catchup=False',
        default_args=default_args,
        schedule_interval=datetime.timedelta(days=1)) as dag:

    bq_recent_questions_query = bigquery_operator.BigQueryOperator(
        task_id='bq_weather_query',
        bql="""
        SELECT owner_display_name, title, view_count
        FROM `bigquery-public-data.stackoverflow.posts_questions`
        ORDER BY view_count DESC
        LIMIT 100
        """,
        use_legacy_sql=False)     

Is this the correct way? How can i get the query results from this?

Upvotes: 0

Views: 2515

Answers (1)

Minato
Minato

Reputation: 462

The BigQueryOperator is generally used to execute a query in BigQuery and then load the result to another BigQuery table (transform operation). I assume you're trying to select 3 columns from a BigQuery public table and load to another table. So provide destination_dataset_table in BigQueryOperator.

Please note the following:

  1. The stackoverflow.posts_questions table is very large, and even if you use LIMIT it will still scan the entire table. So beware of the cost.
  2. Use sql param instead of bql as it is no longer used.

Upvotes: 2

Related Questions