How does PostgreSQL deal with performance when having millions of entries

Question

It may be a silly basic question but as described in the title, I am wondering how PostgreSQL deals with performance when having millions of entries (with the possibility of reaching a billion entries).

To put it in a more concrete way, I want to store data (audio, photos and videos) in my database (I'm only storing their path, files are organised in the file system), but I have to decide wether I use a single table "data" to store all the different types of data, or multiple tables ("data_audio", "data_photos", "data_videos") to separate those types.

The reason why I am asking this question is that I have something like 95% of photos and 5% of audio and videos, and if I want to query my database for an audio entry, I don't want it to be slowed by all the photos entries (searching for a line among a thousand must be different than searching among a million). So I would like to know how PostgreSQL deals with this and if there exists some way to have the best optimisation.

I have read this topic that is really interesting and seems relevant: How does database indexing work?

Is it the way I should do?

Recap of the core stored informations I will have in my core tables:

1st option:

DATA TABLE (containing audio, photos and videos):
- id of type bigserial
- _timestamp of type timestamp
- path_file of type text
USERS TABLE:
- id of type serial
- forename of type varchar(255)
- surname of type varchar(255)
- birthdate of type date
- email_address of type varchar(255)
DATA USERS RELATION TABLE:
- id_data of type bigserial
- id_user of type serial
ACTIVITIES TABLE:
- id of type serial
- name of type varchar(255)
- description of type text
DATA ACTIVITIES RELATION TABLE:
- id_data of type bigserial
- id_activity of type serial

(SEARCH queries are mainly done on DATA._timestamp and ACTIVITIES.name fields after filtering data by USERS.id)

2nd option (only switching the previous DATA TABLE with the following three tables and keeping all the other tables):

DATA_AUDIO TABLE
DATA_PHOTOS TABLE
DATA_VIDEOS TABLE

Additional question: Is it a good idea to have a database per user ? (in the storyline, to be able to query the database for data depends on whether you have the permission or not, and if you want to retrieve data from two different users, you have to ask the permission from both users, and the permission process is a process in its own right, it is not handled here, so let’s say that when you query the database, it will always be queries on the same user)

I hope I have been clear, thanks in advance for any help or advices!

Cyrille

Laurenz Albe · Accepted Answer

Answers:

PostgreSQL is cool with millions and billions of rows.
If the different types of data all have the same attributes and are the same from the database perspective (have the same relationships to other tables etc.), then keep them in one table. If not, use different tables.
The speed of index access to a table does not depend on the size of the table.
If the data of different users have connections, like they use common base tables or you want to be able to join tables for different users, it is best to keep them in different schemas in one database. If it is important that they be separated no matter what, keep them in different databases.

It is also an option to keep data for different users in one table, if you use Row Level Security or let your application take care of it.

This decision depends strongly on your use case and architecture.

Warning: don't create clusters with thousands of databases and databases with thousands of schemas. That causes performance problems in the catalogs.

How does PostgreSQL deal with performance when having millions of entries

Answers (1)

Related Questions