Frankie Drake
Frankie Drake

Reputation: 1388

Exporting large file from BigQuery to Google cloud using wildcard

I have 8Gb table in BigQuery that I'm trying to export to Google Cloud Storage (GCS). If I specify url as it is, I'm getting an error

Errors:
Table gs://***.large_file.json too large to be exported to a single file. Specify a uri including a * to shard export. See 'Exporting data into one or more files' in https://cloud.google.com/bigquery/docs/exporting-data. (error code: invalid)

Okay... I'm specifying * in a file name, but it exports it in 2 files: one 7.13Gb and one ~150Mb.

UPD. I thought I should get about 8 files, 1Gb each? Am I wrong? Or what am I doing wrong?

P.S. I tried this in WebUI mode as well as using Java library.

Upvotes: 3

Views: 8617

Answers (2)

Tiago Peres
Tiago Peres

Reputation: 15632

To export it to GCP you have to go to the table and click EXPORT > Export to GCS.

BigQuery export table

This opens the following screen

Export to GCS

In Select GCS location you define the bucket, the folder and the file.

For instances, you have a bucket named daria_bucket (Use only lowercase letters, numbers, hyphens (-), and underscores (_). Dots (.) may be used to form a valid domain name.) and want to save the file(s) in the root of the bucket with the name test, then you write (in Select GCS location)

daria_bucket/test.csv

Because the file is too big, you're getting an error. To fix it, you'll have to break it down into more files using wildcard. So, you'll need to add *, just like that

daria_bucket/test*.csv

Wildcard export to GCS

This is going to store, inside of the bucket daria_bucket, all the data extracted from the table in more than one file named test000000000000, test000000000001, test000000000002, ... testX.

In my case (more than 1 year after you've asked the question), using a random table of 1,25 GBs, got 16 files with 80,3 MBs each.

Upvotes: 5

Felipe Hoffa
Felipe Hoffa

Reputation: 59355

For files of certain size or larger, BigQuery will export to multiple GCS files - that's why it asks for the "*" glob.

Once you have multiple files in GCS, you can join them into 1 with the compose operation:

gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite

Upvotes: 3

Related Questions