ValiantCoder
ValiantCoder

Reputation: 33

bigquery with a large number of tables

I know there has already been a question regarding the table number limits, but it was vague... In a dataset I want to create about 1-2 milion tables. This happens because I want to split my users activity table into smaller tables; for each user a table. And in time this number will keep on growing. As I understand there will be no problem from BigQuery's perpective...but i'm concerned that I will not be able to access (list) those datasets from browser (https://bigquery.cloud.google.com/queries/appname); because the tables are not grouped by time (like in the case of tables with timerange) and they get all listed in an endless scroll (possibly blocking the browser)

Thank you for any suggestions

Upvotes: 2

Views: 901

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

… the problem is that the browser will get blocked while listing all tables in the dataset

You can use the "?minimal" parameter to limit the load operation to 30,000 tables per project, so browser will not be blocked. For example:

https://bigquery.cloud.google.com/queries/<your_project_name>?minimal.   

see more about Display limits

I can't easily explore my dataset because of this (and query them)

If you are planning to have 2+ million tables in same dataset, even if Web UI were to show them to you without being blocked - I really doubt you would be able to somehow reasonably visually explore them. Just too many objects to “swallow”
Btw, this is not only human specific issue - even querying such "long" tables list programmatically can be problematic. See more about Using meta-tables

because the tables are not grouped by time (like in the case of tables with timerange) and they get all listed in an endless scroll (possibly blocking the browser)

That’s right, in BigQuery Web UI tables will be grouped only if they follow table_preffixYYYYMMDD pattern. Even if you would map your userID namespace to YYYYMMDD value – you would still be out of luck as your group still will consists of those millions tables.

Thank you for any suggestions

BigQuery supports Partitioned Tables which allows to have multiple partitions in the same table. Unfortunately, as of today, only Date-Partitioned tables are supported, but from what I heard BigQuery Team plans to add partitioning by arbitrary column.
This would probably fit to your desired design, unless there will be a limitation to column cardinality.
Meantime, if you want, you can experiment with applying your design using date-partitioned tables feature by mapping userid to YYYYMMDD (~9999*12*30 >> 3+ million users)

My recommendation:

  1. Play/experiment with partitioned tables as I suggested in previous (above) section
  2. Sharding (splitting) tables in BigQuery to millions of tables sound to me extremely impractical. You should revisit your design. What it is that you are trying to address by such sharding? Try to focus on this and if needed - post specific question here on SO!

Upvotes: 3

Marlon Abeykoon
Marlon Abeykoon

Reputation: 12465

As an alternative solution for this you can use Google cloud sdk client.

You can read the documentation for this bq Command-Line tool here.

eg: bq ls [project_id:][dataset_id] to list all tables.

NOTE: Maximum tables per query is limited to 1000. Refer

Upvotes: 0

Related Questions