Reputation: 17913
There are several options for loading data into BigQuery: e.g. bulk import from GCS, streaming and others.
In many cases, one needs to shard the data being loaded, e.g. by date, or by an arbitrary key, in order to produce smaller tables that are faster to query, or to get around the per-table import quotas.
Recently, a new feature was introduced, template tables, that makes such sharding very easy for streaming: you just specify a suffix of the table name you want to stream to, on a per-record basis.
Is this BigQuery feature available for other import modes, most importantly for import from GCS? It would be very useful for importing large amounts of data to BigQuery in a sharded way, which is a common use case e.g. when using Cloud Dataflow for batch jobs.
Upvotes: 1
Views: 418
Reputation: 26637
No, template tables are not available for bulk import at this time; the rationale is that since bulk import can create tables as a side-effect, this wouldn't be necessary.
For streaming imports, the semantics are a bit trickier. Since streaming insert requests don't specify a schema, if the destination table doesn't exist, BigQuery doesn't know what the desired schema of the table should be. Template tables allow the streaming system to look up the desired schema from somewhere else.
For bulk loads, however, the schema is generally included as part of the request, or can be inferred from the data, so template tables don't make as much sense.
All this said, we're well aware that management of multiple sharded tables is inconvenient, and hope to have some improvements ready soon.
Upvotes: 1