Neroksi
Neroksi

Reputation: 1398

Build a docker image for google bigquery

I have a huge amount of data (hundreds of Gigas) on Google BigQuery and for easy of use (many post query treatements) I'm working with the bigquery python package. The problem is that I have to run again all my queries whenever I shut my laptop down, this is very expensive as my dataset is about one Tera. I think of Google Compute Engine but this is a poor solution as I will still paying for my machines if I don't stop them. My last solution is to mount a docker image on our own sandbox, this is cheaper and can do exactly what I'm looking for. So I would like to know if someone has ever mounted a docker image for BigQuery ? Thanks for helping!

Upvotes: 5

Views: 5206

Answers (1)

chadf
chadf

Reputation: 188

We mount all of our python/bigquery projects into docker containers and push them to google cloud registry.

Automated scheduling, dependancy graphing, and logging can be handled with Google Cloud Composer (Airflow). Its pretty simple to get set up, and Airflow has a Kubernetes Pod Operator, That allows you to specify a python file to run in your docker image on GCR. You can use this workflow to make sure all of your queries and python scripts are run on GCP without having to worry about Google Compute Engine, or any devops type of things.

https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator https://cloud.google.com/composer/

Upvotes: 4

Related Questions