Transferring BigQuery data to amazon s3 bucket

I have a schema stored in Big Query and I need to transfer that to the amazon s3 bucket ..What is the process ?.. The BigQuery data is of other organization who is willing to give proper access to transfer the data. I found ways to import from s3 bucket to gcp but no way to export from big query to s3 directly..I m really stuck in this part. Also I need to schedule this process as the data in bigquery is changing and I constantly want data to my s3 bucket everyday . Please mention references for that. Note that per days data is gonna be in TBs

Upvotes: 3

Answers (3)

dylanvanw

Reputation: 3341

You can use bigquery connections to export to S3 directly using a bigquery query. Here is the documentation for this feature: https://cloud.google.com/bigquery/docs/reference/standard-sql/export-statements

Sample code:

EXPORT DATA
  WITH CONNECTION myproject.us.myconnection
  OPTIONS(
  uri='s3://bucket/folder/*',
  format='JSON',
  overwrite=true) AS
SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10

Upvotes: 0

Sreeraj

Reputation: 316

I guess this is possible directly using the Export query: Ref: Transferring BigQuery data to amazon s3 bucket

EXPORT DATA WITH CONNECTION CONNECTION_REGION.CONNECTION_NAME
OPTIONS(uri="s3://BUCKET_NAME/PATH", format="FORMAT", ...)
AS QUERY

Upvotes: 0

Hussein Negm

Reputation: 601

AFAIK there is no direct bigquery to s3 pipeline but the work around is BigQuery to Google cloud storage then to s3

First step is to setup gcloud sdk on the machine where the job would be scheduled.
Configure gcloud and pay special attention to boto configuration you can skip this step if the process is going to run on an EC2 with a role attached that allows it to write to S3 (also a better security practice than using keys)

copy data from bigquery to gcs

 bq --location=US extract --destination_format AVRO --compression SNAPPY [source_table] [target location on gcs]

copy the file from gcs to s3

 gsutil rsync -r [target location on gcs] to [target location on s3]

Note: this process works fine for GA360 data because it is already partitioned by day but if this is not the case and you can't separate the tables then you can schedule a query on GCP to create the table with incremental logic that will be used as the [source table]

update AWS released a BQ glue connector that can do that

Upvotes: 5

Transferring BigQuery data to amazon s3 bucket

Answers (3)

Related Questions