Reputation: 704
I have a single table in BigQuery time partitioned by day . Dataflow job uses streaming API to insert new records continuously, but only to the newest partitions (two in the corner case, when data comes slightly out of order on the border of days).
On the other side, I query the table a lot aggregating historical months, not touching the most recent days, i.e. the streaming buffer as well.
I would like to leverage the caching of the results of such queries. Streaming to the table unfortunately disables the cache, even though theoretically the cached results are not influenced by the streamed rows.
How do I use caching on historical partitions while still be able to stream to the newest partitions?
If it is impossible out of the box, is it a good design to:
If yes, how would I define such a view that will use caching if only "historical" data is queried? Or would I need to have my own query rewrite tool?
Maybe you have other ideas?
Upvotes: 1
Views: 510
Reputation: 1098
You cannot use caching and streaming contemporaneously. There is already a feature request asking for the same you want.
As a workaround ,as you said, you need two different tables and use data redundancy. I agree with the method you posted:
Check here how to "manage partitioned tables". There is a list of use cases based on bq cp
command which could help you to merge "recent" table into the "historical" one.
Upvotes: 1