Cyrille Dakhlia
Cyrille Dakhlia

Reputation: 407

How does PostgreSQL deal with performance when having millions of entries

It may be a silly basic question but as described in the title, I am wondering how PostgreSQL deals with performance when having millions of entries (with the possibility of reaching a billion entries).

To put it in a more concrete way, I want to store data (audio, photos and videos) in my database (I'm only storing their path, files are organised in the file system), but I have to decide wether I use a single table "data" to store all the different types of data, or multiple tables ("data_audio", "data_photos", "data_videos") to separate those types.

The reason why I am asking this question is that I have something like 95% of photos and 5% of audio and videos, and if I want to query my database for an audio entry, I don't want it to be slowed by all the photos entries (searching for a line among a thousand must be different than searching among a million). So I would like to know how PostgreSQL deals with this and if there exists some way to have the best optimisation.

I have read this topic that is really interesting and seems relevant: How does database indexing work?

Is it the way I should do?

Recap of the core stored informations I will have in my core tables:

1st option:

(SEARCH queries are mainly done on DATA._timestamp and ACTIVITIES.name fields after filtering data by USERS.id)

2nd option (only switching the previous DATA TABLE with the following three tables and keeping all the other tables):

Additional question: Is it a good idea to have a database per user ? (in the storyline, to be able to query the database for data depends on whether you have the permission or not, and if you want to retrieve data from two different users, you have to ask the permission from both users, and the permission process is a process in its own right, it is not handled here, so let’s say that when you query the database, it will always be queries on the same user)

I hope I have been clear, thanks in advance for any help or advices!

Cyrille

Upvotes: 0

Views: 1132

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246788

Answers:

  • PostgreSQL is cool with millions and billions of rows.

  • If the different types of data all have the same attributes and are the same from the database perspective (have the same relationships to other tables etc.), then keep them in one table. If not, use different tables.

  • The speed of index access to a table does not depend on the size of the table.

  • If the data of different users have connections, like they use common base tables or you want to be able to join tables for different users, it is best to keep them in different schemas in one database. If it is important that they be separated no matter what, keep them in different databases.

    It is also an option to keep data for different users in one table, if you use Row Level Security or let your application take care of it.

    This decision depends strongly on your use case and architecture.

    Warning: don't create clusters with thousands of databases and databases with thousands of schemas. That causes performance problems in the catalogs.

Upvotes: 1

Related Questions