Reputation: 38899
I've been struggling to load big chunks of data into bigquery for a little while now. In Google's docs, I see the insertAll method, which seems to work fine, but gives me 413 "Entity too large" errors when I try to send anything over about 100k of data in JSON. Per Google's docs, I should be able to send up to 1TB of uncompressed data in JSON. What gives? The example on the previous page has me building the request body manually instead of using insertAll, which is uglier and more error prone. I'm also not sure what format the data should be in in that case.
So, all of that said, what is the clean/proper way of loading lots of data into Bigquery? An example with data would be great. If at all possible, I'd really rather not build the request body myself.
Upvotes: 1
Views: 3961
Reputation: 26617
The example here uses the resumable upload to upload a CSV file. While the file used is small, it should work for virtually any size upload since it uses a robust media upload protocol. It sounds like you want json, which means you'd need to tweak the code slightly for json (an example for json is in the load_json.py example in the same directory). If you have a stream you want to upload instead of a file, you can use a MediaInMemoryUpload instead of the MediaFileUpload that is used in the example.
BTW ... Craig's answer is correct, I just thought I'd chime in with links to sample code.
Upvotes: 1
Reputation: 6625
Note that for streaming data to BQ, anything above 10k rows/sec requires talking to a sales rep.
If you'd like to send large chunks directly to BQ, you can send it via POST
. If you're using a client library, it should handle making the upload resumable for you. To do this, you'll need to make a call to jobs.insert()
instead of tabledata.insertAll()
, and provide a description of a load
job. To actually push the bytes using the Python client, you can create a MediaFileUpload
or MediaInMemoryUpload
and pass it as the media_body
parameter.
The other option is to stage the data in Google Cloud Storage and load it from there.
Upvotes: 5