MM1883
MM1883

Reputation: 66

Reading and writing to bigquery in gcp. What service?

I'm creating a bigquery table where I join and transform data from several other bigquery tables. It's all written in sql and the whole query takes about 20 minutes to run and consists of several sql scripts. I'm also creating some intermediate tables before the end table is created.

Now I want to make above query more robust and schedule it and I cant decide on the tool. Alternatives I'm thinking about.

  1. Make it into a dataflow job and schedule with cloud scheduler. This feels like it might be overkill because all the code is in SQL and from bq --> bq.

  2. Create scheduled queries to load the data. No experience with this but seems quiet nice

  3. Create a python script that executed all the sql using the BQ API. Create a cron job and schedule it to run somewhere in GCP.

Any suggestions on what would be a preferred solution?

Upvotes: 0

Views: 301

Answers (2)

rodvictor
rodvictor

Reputation: 339

According to my experience with GCP, both Cloud Composer and Dataflow jobs would be, as you suggested, overkill. None of these products would be serverless and would probably imply a higher economic cost because of the instances running in the background.

On the other hand, you can create scheduled queries on a regular basis (daily, weekly, etc) that are separated by a big enough time window to make sure the queries are carried out in the expected order. In this sense, the final table would be constructed correctly from the intermediate ones.

From my point of view, both executing a Python Script and sending notifications to Pub/Sub triggering a Cloud Function (as apw-ub suggested) are also good options.

All in all, I guess the final decision should depend more on your personal preference. Please feel free to use the Google Cloud Pricing Calculator (1) for having an estimate of how costly each of the options would be.

Upvotes: 0

apw-ub
apw-ub

Reputation: 86

If it's encapsulated in a single script (or even multiple) I'd schedule it through BQ. It will handle your query no different than the other options so it doesn't make sense to set up extra services for it.

Are you able to run it as a single query?

Upvotes: 1

Related Questions