Reputation: 407
It may be a silly basic question but as described in the title, I am wondering how PostgreSQL deals with performance when having millions of entries (with the possibility of reaching a billion entries).
To put it in a more concrete way, I want to store data (audio, photos and videos) in my database (I'm only storing their path, files are organised in the file system), but I have to decide wether I use a single table "data" to store all the different types of data, or multiple tables ("data_audio", "data_photos", "data_videos") to separate those types.
The reason why I am asking this question is that I have something like 95% of photos and 5% of audio and videos, and if I want to query my database for an audio entry, I don't want it to be slowed by all the photos entries (searching for a line among a thousand must be different than searching among a million). So I would like to know how PostgreSQL deals with this and if there exists some way to have the best optimisation.
I have read this topic that is really interesting and seems relevant: How does database indexing work?
Is it the way I should do?
Recap of the core stored informations I will have in my core tables:
1st option:
DATA TABLE (containing audio, photos and videos):
USERS TABLE:
DATA USERS RELATION TABLE:
ACTIVITIES TABLE:
DATA ACTIVITIES RELATION TABLE:
(SEARCH queries are mainly done on DATA._timestamp and ACTIVITIES.name fields after filtering data by USERS.id)
2nd option (only switching the previous DATA TABLE with the following three tables and keeping all the other tables):
Additional question: Is it a good idea to have a database per user ? (in the storyline, to be able to query the database for data depends on whether you have the permission or not, and if you want to retrieve data from two different users, you have to ask the permission from both users, and the permission process is a process in its own right, it is not handled here, so let’s say that when you query the database, it will always be queries on the same user)
I hope I have been clear, thanks in advance for any help or advices!
Cyrille
Upvotes: 0
Views: 1132
Reputation: 246788
Answers:
PostgreSQL is cool with millions and billions of rows.
If the different types of data all have the same attributes and are the same from the database perspective (have the same relationships to other tables etc.), then keep them in one table. If not, use different tables.
The speed of index access to a table does not depend on the size of the table.
If the data of different users have connections, like they use common base tables or you want to be able to join tables for different users, it is best to keep them in different schemas in one database. If it is important that they be separated no matter what, keep them in different databases.
It is also an option to keep data for different users in one table, if you use Row Level Security or let your application take care of it.
This decision depends strongly on your use case and architecture.
Warning: don't create clusters with thousands of databases and databases with thousands of schemas. That causes performance problems in the catalogs.
Upvotes: 1