richjcooper
richjcooper

Reputation: 51

Loading from Google cloud storage to Big Query seems slow

I'm running a test using Big Query. Basically I have 50,000 files, each of which are 27MB in size, on average. Some larger, some smaller.

Timing each file upload reveals:

real 0m49.868s user 0m0.297s sys 0m0.173s

Using something similar to:

time bq load --encoding="UTF-8" --field_delimiter="~" dataset gs://project/b_20130630_0003_1/20130630_0003_4565900000.tsv schema.json

Running command: "bq ls -j" and subsequently running "bq show -j " reveals that I have the following errors:

Job Type State Start Time Duration Bytes Processed


load FAILURE 01 Jul 22:21:18 0:00:00

Errors encountered during job execution. Exceeded quota: too many imports per table for this table

After checking the database, the rows seems to of loaded fine which is puzzling since, given the error, I would of expected nothing to of gotten loaded. The problem is that I really don't understand how I reached my quota limit since I've only just started uploading files recently and thought the limit was 200,000 requests.

All the data is currently on Google Cloud Storage so I would expect the data loading to happen fairly quickly since the interaction is between cloud storage and Big Query both of which are in the cloud.

By my calculations the entire load is going to take: (50,000 * 49 seconds) 28 days.

Kinda hoping these numbers are wrong.

Thanks.

Upvotes: 4

Views: 3165

Answers (1)

Jordan Tigani
Jordan Tigani

Reputation: 26617

The quota limit per table is 1000 loads per day. This is to encourage people to batch their loads, since we can generate a more efficient representation of the table if we can see more of the data at once.

BigQuery can perform load jobs in parallel. Depending on the size of your load, a number of workers will be assigned to your job. If your files are large, those files will be split among workers; alternately if you pass multiple files, each worker may process a different file. So the time that it takes for one file is not indicative of the time that it takes to run a load job with multiple files.

Upvotes: 3

Related Questions