user10286505
user10286505

Reputation: 1

Apache Airflow - MySQL to BigQuery - how to get data from previous month?

We are building a Data Warehouse solution in Google BigQuery using Apache Airflow to transfer MySQL tables to the cloud. I have daily transfers up and running fetching data incrementally using

MySQLToGoogleCloudStorageOperator => GoogleCloudStorageTBigQueryOperator

with that bit in .sql:

WHERE timestamp BETWEEN '{{ macros.ds_add(ds, -1) }}' AND '{{ ds }}'

It's working perfectly but we have one table which we would like to update MONTHLY, on 10th of each month but fetching data from the whole previous month - so When ran with an execution date of 2019-03-10 we want to fetch

timestamp BETWEEN '2019-02-01 00:00:00' AND '2019-03-01 00:00:00'

and so on. What's the best and simplest way to do that?

Upvotes: 0

Views: 1487

Answers (3)

Breathe
Breathe

Reputation: 724

Could you not simply play with the execution date? -> first day of last month

{{ ds.ds_add(-30).replace(day=1) }}

-> first days of following month

{{ ds.replace(day=1) }}

Upvotes: 0

Yun Zhang
Yun Zhang

Reputation: 5518

#standardSQL
SELECT
  CURRENT_DATE() as today,
  DATE_TRUNC(CURRENT_DATE(), MONTH) as first_day_this_month,
  DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 MONTH) as first_day_last_month;

Results: enter image description here

Upvotes: 0

Raunak Jhawar
Raunak Jhawar

Reputation: 1651

Have you considered Cloud BQ partitions?

You can create monthly partitions (which are dynamic in nature) and only patch data for the month you wish to mutate on Cloud BQ.

All of this can be very neatly orchestrated with a coupling between Cloud Composer and Cloud Functions.

Upvotes: 0

Related Questions