kruvi
kruvi

Reputation: 524

SQL Server: Help in table design

I have an application that displays posts, and for each post the users can say if they like the post or not. For each post I should display how many users likes it and how many dont like it. Suppose I have these tables:

CREATE TABLE [dbo].[Post](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Quotation] [text] NOT NULL,
CONSTRAINT [PK_Post] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF,     ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

CREATE TABLE [dbo].[UserPostAction]( --Every action of the user (like or dislike) is     recorded to this table
[PostId] [bigint] NOT NULL,
[UserId] [bigint] NOT NULL,
[ActionValue] [int] NOT NULL, --Like / Dislike
CONSTRAINT [PK_UserPostAction] PRIMARY KEY CLUSTERED 
(
[PostId] ASC,
[UserId] ASC,    
 )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF,      ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
  ) ON [PRIMARY]

What is the best way, performance wise, to have a select statement that returns a list of posts which contains the like/dislike statistics already in:

A) Add 2 more columns to Post table: TotalLike and TotalDislike, and when inserting a new record to UserPostAction I will update these columns? This way, when selecting posts from Post table I will have the statistics already calculated.

B) Add a third table: PostStat ([PostId], [TotalLike], [TotalDislike]) and update the table in a batch process. The select statement will use Inner Join on Post.Id = PostStat.PostId to retrieve the statistics.

C) Any other way you can think of.

Hopefully, I will have a DB with millions of posts, and the Post table will be queried very often by a lot of users. Keep in mind that the statistics is updated very frequently, because a lot of users will like or dislike a lot of posts. It means (maybe) performing a lot of updates and table locking, which will interfere with the select statement that returns posts to be displayed.

Any idea is welcome.

Upvotes: 1

Views: 145

Answers (2)

acorello
acorello

Reputation: 4653

I faced a similar situation and I choose the strategy A:

  • Putting that information on the main table is the "normal" choice given that you'll have a one-to-one relationship between Post and PostStat.
  • you won't have to do a join between the Post and PostStats tables every time you query Posts, that I assume is more expensive than accessing a single table containing the same columns scattered in the two.
  • if you'll update the like/dislike counters with a trigger (that's what I did), having the column on the parent table will make life so much easier... In fact you'll just have to write an "update post set likes = likes + 1 where postId = :new.postId" (oracle sql syntax), and the referential integrity on postId will guarantee that you have a row to update there.

Knowing that writes don't block reads (in fact they will just read the previous value until the transaction is completed) I doubt you'll have locking issues putting the columns on the main table.

Upvotes: 1

Diego
Diego

Reputation: 36126

I will have a DB with millions of posts

don't think this is enough reason to maintain duplicate data on your DB.

You already have all the info you need on your current tables. Its a matter of building the right indexes and you can easily count how many users liked\unliked the post.

If you are really unconformable with this approach, I suggest implementing option A with triggers on the UserPostAction table to maintain your 2 new columns.

Upvotes: 1

Related Questions