Andre
Andre

Reputation: 788

Liked Posts Design Specifics

So I've found through researching myself that the best way I can design a structure for liking posts is by having a database like the following. Let's say like Reddit, a post can be upvoted, downvoted, or not voted on at all.

The database would then having three columns, [username,post,liked].

Liked could be some kind of boolean, 1 indicating liked, and 0 indicating disliked.

Then to find a post like amount, I would do SELECT COUNT(*) FROM likes WHERE post=12341 AND liked=1 for example, then do the same for liked=0(disliked), and do the addition server side along with controversy percentage.

So I have a few concerns, first off, what would be the appropriate way to find out if a user liked a post? Would I try to select the liked boolean value, and either retrieve or catch error. Or would I first check if the record exist, and then do another select to find out the value? What if I want to check if a user liked multiple posts at once?

Secondly, would this table not need a primary key? Because no row will have the same post and username, should I use a compound primary key?

Upvotes: 3

Views: 1032

Answers (4)

Nathan Hawks
Nathan Hawks

Reputation: 587

For performance you will want to alter your database plans:

User Likes Post table

Fields:

  • Liked should be a boolean, you are right. You can transform this to -1/+1 in your code. You will cache the numeric totals elsewhere.
  • Username should be UserID. You want only numeric values in this table for speed.
  • Post should be PostID for the same reason.

You also want a numeric primary key because they're easier to search against, and to perform sub-selects with.

And create a unique index on (Username, Post), because this table is mainly an index built for speed.

So did a user vote on a post?

select id 
from user_likes_post 
where userID = 123 and postID = 456;

Did the user like the post?

select id 
from user_likes_post 
where userID = 123 and postID = 456 and liked = true;

You don't need to worry about errors, you'll either get results or you won't, so you might as well go straight to the value you're after:

select liked from user_liked_post where userID=123 and postID=456

Get all the posts they liked:

select postID
from user_likes_post
where userID = 123 and liked = true;

Post Score table

  • PostID
  • TotalLikes
  • TotalDislikes
  • Score

This second table will be dumped and refreshed every n minutes by calculating on the first table. This second table is your cached aggregate score that you'll actually load for all users visiting that post. Adjust the frequency of this repeat dump-and-repopulate schedule however you see fit. For a small hobby or student project, just do it every 30 seconds or 2 minutes; bigger sites, every 10 or 15 minutes. For an even bigger site like reddit, you'd want to make the schema more complex to allow busier parts of the site to have faster refresh.

// this is not exact code, just an outline

totalLikes = 
 select count(*) 
  from user_likes_post 
  where postID=123 and liked=true
totalDislikes = 
 select count(*) 
  from user_likes_post 
  where postID=123 and liked=false
totalVotes = totalLikes + totalDislikes
score = totalLikes / totalVotes;

(You can simulate an update by involving the user's localStorage -- client-side Javascript showing a bump-up or down on the posts that user has voted on.)

Upvotes: 2

displayName
displayName

Reputation: 14399

My initial thought was that the problem is because boolean type is not rich enough to express the possible reactions to a post. So instead of boolean, you needed an enum with possible states of Liked, Disliked, and the third and the default state of Un-reacted.

Now however it seems, you can do away with boolean too because you do not need to record the Un-reacted state. A lack of reaction means that you do not add the entry in the table.


What would be the appropriate way to find out if a user liked a post?

SELECT Liked
FROM Likes
WHERE Likes.PostId == 1234
    AND Likes.UserName == "UniqueUserName";

If the post was not interacted with by the user, there would be no results. Otherwise, 1 if liked and 0 if disliked.


What if I want to check if a user liked multiple posts at once?

I think for that you need to store a timestamp too. You can then use that timestamp to see if it there are multiple liked post within a short duration.

You could employ k-means clustering to figure if there are any "cluster" of likes. The complete explanation is too big to add here.


Would this table not need a primary key?

Of course it would. But Like is a weak entity depending upon the Post. So it would require the PK of Post, which is the field post (I assume). Combined with username we would have the PK because (post, username) would be unique for user's reaction.

Upvotes: 0

artoodetoo
artoodetoo

Reputation: 938

I agree with Rick James that likes table should be uniquely indexed by (username, post) pair.
Also I advise you to let a bit redundancy and keep the like_counter in the posts table. It will allow you to significantly reduce the load on regular queries.
Increase or decrease the counter right after successful adding the like/dislike record.

All in all,

  • to get posts with likes: plain select of posts
    no need to add joins and aggregate sub-queries.
  • to like/dislike: (1) insert into likes, on success (2) update posts.like_counter.
    unique index prevents duplication.
  • get know if user has already liked the post: select from likes by username+post pair.
    index helps to do it fast

Upvotes: 0

Rick James
Rick James

Reputation: 142560

Given your suggested 3-column table and the selects you suggest, be sure to have

PRIMARY KEY(username, post)   -- helps with "did user like a post"
INDEX(post_id, liked)  -- for that COUNT

When checking whether a user liked a post, either do a LEFT JOIN so that you get one of three things: 1=liked, 0=unliked, or NULL=not voted. Or you could use EXISTS( SELECT .. )

Tables need PKs.

Upvotes: 0

Related Questions