John Smith
John Smith

Reputation: 4363

PostgreSQL - "Ten most frequent entries"

I want to frequently retrieve the top X users that sent the most messages. What would be the optimal (DX and performnce wise) solution for it?

The solutions I see myself:

Please tell me whatever you think might be the best solution.

Upvotes: 1

Views: 48

Answers (2)

jjanes
jjanes

Reputation: 44285

Some databases have incrementally updated views, where you create a view like in your example 3, and it automatically keeps it updated like in your example 2. PostgreSQL does not have this feature.

For your option 1, it seems pretty darn clean to me. Hard to get much simpler than that. Yes, it could have performance problems, but how fast do you really need it to be? You should make sure you actually have a problem before worrying about solving it.

For your option 2, what you are looking for is a trigger. For each insertion, it would increment a count in the user table. If you ever delete, you would also need to decrease the count. Also, if ever update to change the user of an existing entry, the trigger would need to decrease the count of the old user and increase it of the new user. This will decrease the concurrency, as if two processes try to insert messages from the same user at the same time, one will block until the other finishes. This may not matter much to you. Also, the mere existence of triggers imposes some CPU overhead, plus whatever the trigger itself actually does. But unless our server is already overloaded, this might not matter.

Your option 3 doesn't make much sense to me, at least not in PostgreSQL. There is no performance benefit, and it would act to obscure rather than clarify what is going on. Anyone who can't understand a GROUP BY is probably going to have even more problems understanding a view which exists only to do a GROUP BY.

Another option is a materialized view. But you will see stale data from them between refreshes. For some uses that is acceptable, for some it is not.

Upvotes: 2

Laurenz Albe
Laurenz Albe

Reputation: 247235

The first and third solutions are essentially the same, since a view is nothing but a “crystallized” query.

The second solution would definitely make for faster queries, but at the price of storing redundant data. The disadvantages of such an approach are:

  • You are running danger of inconsistent data. You can reduce that danger somewhat by using triggers that automatically keep the data synchronized.

  • The performance of modifications of message will be worse, because the trigger will have to be executed, and each modification will also modify users (that is the natural place to keep such a count).

The decision should be based on the question whether the GROUP BY query will be fast enough for your purposes. If yes, use it and avoid the above disadvantages. If not, consider storing the extra count.

Upvotes: 1

Related Questions