Killswitch
Killswitch

Reputation: 346

MySQL- Counting rows VS Setting up a counter

I have 2 tables posts<id, user_id, text, votes_counter, created> and votes<id, post_id, user_id, vote>. Here the table vote can be either 1 (upvote) or -1(downvote). Now if I need to fetch the total votes(upvotes - downvotes) on a post, I can do it in 2 ways.

  1. Use count(*) to count the number of upvotes and downvotes on that post from votes table and then do the maths.
  2. Set up a counter column votes_counter and increment or decrement it everytime a user upvotes or downvotes. Then simply extract that votes_counter.

My question is which one is better and under what condition. By saying condition, I mean factors like scalability, peaktime et cetera.

To what I know, if I use method 1, for a table with millions of rows, count(*) could be a heavy operation. To avoid that situation, if I use a counter then during peak time, the votes_counter column might get deadlocked, too many users trying to update the counter!

Is there a third way better than both and as simple to implement?

Upvotes: 2

Views: 445

Answers (2)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726589

The two approaches represent a common tradeoff between complexity of implementation and speed.

  • The first approach is very simple to implement, because it does not require you to do any additional coding.
  • The second approach is potentially a lot faster, especially when you need to count a small percentage of items in a large table
  • The first approach can be sped up by well designed indexes. Rather than searching through the whole table, your RDBMS could retrieve a few records from the index, and do the counts using them

The second approach can become very complex very quickly:

  • You need to consider what happens to the counts when a user gets deleted
  • You should consider what happens when the table of votes is manipulated by tools outside your program. For example, merging records from two databases may prove a lot more complex when the current counts are stored along with the individual ones.

I would start with the first approach, and see how it performs. Then I would try optimizing it with indexing. Finally, I would consider going with the second approach, possibly writing triggers to update counts automatically.

Upvotes: 5

MrLore
MrLore

Reputation: 3780

As this sounds a lot like StackExchange, I'll refer you to this answer on the meta about the database schema used on the site. The votes table looks like this:

Votes table:

  • Id
  • PostId
  • VoteTypeId, one of the following values:

    1 - AcceptedByOriginator
    2 - UpMod
    3 - DownMod
    4 - Offensive
    5 - Favorite (if VoteTypeId = 5, UserId will be populated)
    6 - Close
    7 - Reopen
    8 - BountyStart (if VoteTypeId = 8, UserId will be populated)
    9 - BountyClose
    10 - Deletion
    11 - Undeletion
    12 - Spam
    15 - ModeratorReview  
    16 - ApproveEditSuggestion
    
  • UserId (only present if VoteTypeId is 5 or 8)

  • CreationDate
  • BountyAmount (only present if VoteTypeId is 8 or 9)

And so based on that it sounds like the way it would be run is:

SELECT VoteTypeId FROM Votes WHERE VoteTypeId = 2 OR VoteTypeId = 3

And then based on the value, do the maths:

int score = 0;
for each vote in voteQueryResults
    if(vote == 2) score++;
    if(vote == 3) score--;

Even with millions of results, this is probably going to be a very fast operation as it's so simple.

Upvotes: 1

Related Questions