Reputation: 33
I know there has already been a question regarding the table number limits, but it was vague... In a dataset I want to create about 1-2 milion tables. This happens because I want to split my users activity table into smaller tables; for each user a table. And in time this number will keep on growing. As I understand there will be no problem from BigQuery's perpective...but i'm concerned that I will not be able to access (list) those datasets from browser (https://bigquery.cloud.google.com/queries/appname); because the tables are not grouped by time (like in the case of tables with timerange) and they get all listed in an endless scroll (possibly blocking the browser)
Thank you for any suggestions
Upvotes: 2
Views: 901
Reputation: 172944
… the problem is that the browser will get blocked while listing all tables in the dataset
You can use the "?minimal" parameter to limit the load operation to 30,000 tables per project, so browser will not be blocked. For example:
https://bigquery.cloud.google.com/queries/<your_project_name>?minimal.
see more about Display limits
I can't easily explore my dataset because of this (and query them)
If you are planning to have 2+ million tables in same dataset, even if Web UI were to show them to you without being blocked - I really doubt you would be able to somehow reasonably visually explore them. Just too many objects to “swallow”
Btw, this is not only human specific issue - even querying such "long" tables list programmatically can be problematic. See more about Using meta-tables
because the tables are not grouped by time (like in the case of tables with timerange) and they get all listed in an endless scroll (possibly blocking the browser)
That’s right, in BigQuery Web UI tables will be grouped only if they follow table_preffixYYYYMMDD pattern. Even if you would map your userID namespace to YYYYMMDD value – you would still be out of luck as your group still will consists of those millions tables.
Thank you for any suggestions
BigQuery supports Partitioned Tables which allows to have multiple partitions in the same table. Unfortunately, as of today, only Date-Partitioned tables are supported, but from what I heard BigQuery Team plans to add partitioning by arbitrary column.
This would probably fit to your desired design, unless there will be a limitation to column cardinality.
Meantime, if you want, you can experiment with applying your design using date-partitioned tables feature by mapping userid to YYYYMMDD (~9999*12*30 >> 3+ million users)
My recommendation:
Upvotes: 3
Reputation: 12465
As an alternative solution for this you can use Google cloud sdk client.
You can read the documentation for this bq Command-Line tool here.
eg: bq ls [project_id:][dataset_id]
to list all tables.
NOTE: Maximum tables per query is limited to 1000. Refer
Upvotes: 0