Reputation: 24260
I'm in the process of rolling out a new feature on my webapp that will ultimately result in users having the ability to create dynamic tables in the database. Over time I expect that this may result in thousands, or tens of thousands of tables being created.
I understand that postgres doesn't have explicit limits on the number of tables in the database, however that performance might degrade if that number gets too large. In order to mitigate this I'm thinking of breaking up the underlying storage into either different databases or different schemas. My main question is: is one of those choices choices better than the other? If so, why? It seems easier to implement with schemas, however I'm not sure if that will actually solve some of the potential longer term performance issues that might come up.
Note that the tables are completely independent - so there are no concerns about needing to joins with other tables.
Also, assume I'm handing any validation that might get me into trouble with malicious and/or unexpected users being able to create database tables.
Upvotes: 0
Views: 74
Reputation: 27424
From the Database File Layout of the manual:
Each table and index is stored in a separate file.
So, this is the first point to take into account. You should have a filesystem which does a good job with a large number of files in a single directory, unless you use different tablespaces.
Note that you can have different tablespaces even in the same schema or in the same database, so the use of different schemas could by motivated by other reasons, like having tables with the same name (actually, schemas in PostgreSQL are just a way of partitioning the namespace).
For databases, I think the solution with just a database could be good for you, I assume that each database can introduce a non trivial overhead.
Finally: since the system works by using its own catalog, which is a set of relational tables, I suppose you could scale quite well, maybe you will need to add some indexes on the catalog tables, if they are not present.
The last advice: before investing time and resources on the project, do a simulation of it, by generating programmatically a thousand tables, filling them with random data, and simulating their use under the hypotheses of the load of your system.
Upvotes: 1