Reputation: 789
I'm thinking about organizing a BigQuery dataset with multiple tables that I can query using wildcards.
In this use case, there will potentially be 1000+ tables, each one representing a different customer. I need to be able to query them individually, and I also need to query them all at the same time.
I'm thinking of using multiple tables rather than a single table with clustering, because, when running queries only for specific customers, I would like to be able to scan only the data that I need.
The Quotas and limits page lists the following limits that look applicable:
My questions are:
1) When running a query with wildcard (such as SELECT * FROM `project_id.dataset_id.table_*`
), do all the individual tables that match the wildcard get counted against this 1000 tables limit?
2) What exactly is counted against the 12 MB limit? For example, suppose I have the following tables project_id.dataset_id.table_1
, project_id.dataset_id.table_2
, project_id.dataset_id.table_3
.
table_*
, what will constitute the resolved query length? Does it include the fully qualified name of all tables (project_id.dataset_id.table_1
, project_id.dataset_id.table_2
, project_id.dataset_id.table_3
)?table_*
but include a filter on _TABLE_SUFFIX
(for example, WHERE _TABLE_SUFFIX = "1"
), does this eliminate tables that don't match the filter from the resolved query?3) Are there any other limitations that I should be aware of?
Thank you in advance.
Upvotes: 1
Views: 1799
Reputation: 173210
When running a query with wildcard, do all the individual tables that match the wildcard get counted against this 1000 tables limit?
All tables that match wildcard and _TABLE_SUFFIX filter will count against 1000 limit
What exactly is counted against the 12 MB limit?
Length of fully qualified names of all tables counted against above explained 1000 tables limit will be counted against 12 MB limit
Are there any other limitations that I should be aware of?
Suggestion - combination of partitioning and clustering will serve you much better in use case you describe in your question. Especially that now you can partition by integer range
This will give your superior flexibility in maintaining and processing data to compare with sharded tables approach
Obviously, there are still limitations related to partitioned tables to consider
Upvotes: 2
Reputation: 208042
My advice is against. You will quickly reach 1000 tables and than you cannot query them all. The other is that having each table separately it will be unmaintainable.
You can use clustering to your help.
If one of your columns is the tenant_id
than you can use that in clustering.
Upvotes: 2