Reputation:
I'm looking for the best way to store a set of "posts" as well as comments on those posts in SQL. Imagine a design similar to a "Wall" on Facebook where users can write posts on their wall and other users can comment on those posts. I need to be able to display all wall posts as well as the comments.
When I first started out, I came up with a table such as:
CREATE Table wallposts
(
id uuid NOT NULL,
posted timestamp NOT NULL,
userid uuid NOT NULL,
posterid uuid NOT NULL,
parentid uuid NOT NULL,
comment text NOT NULL
)
id is unique, parentid will be null on original posts and point to an id if the row is a comment on an existing post. Easy enough and super fast to insert new data. However, doing a select which would return me:
POST 1
COMMENT 1
COMMENT 2
POST 2
COMMENT 1
COMMENT 2
Regardless of which order the rows existed in the database proved to be extremely difficult. I obviously can't just order by date, as someone might comment on post 1 after post 2 has been posted. If I do a LEFT JOIN to get the parent post on all rows, and then sort by that date first, all the original posts group together as they'd have a value of null.
Then I got this idea:
CREATE TABLE wallposts
(
id uuid NOT NULL,
threadposted timestamp,
posted timestamp,
...
comment text
)
On an original post, threadposted and posted would be the same. On a comment, timestamp would be the time the original post was posted and "posted" would be the time the comment on that thread was posted. Now I can just do:
select * from wallposts order by threadposted, posted;
This works great, however one thing irks me. If two people create a post at the same time, comments on the two posts would get munged together as they'd have the same timestamp. I could use "ticks" instead of a datetime, but still the accuracy is only 1/1000 of a second. I could also setup a unique constraint on threadposted and posted which makes inserts a bit more expensive, but if I had multiple database servers in a farm, the chance of a collision is still there. I almost went ahead with this anyway since the chances of this happening are extremely small, but I wanted to see if I could eat my cake and still have it too. Mostly for my own educational curiosity.
Third solution would be to store this data in the form of a graph. Each node would have a v-left and v-right pointer. I could order by "left" which would traverse the tree in the order I need. However, every time someone inserts a comment I'd have to re balance the whole tree. This would create a ton of row locking, and all sorts of problems if the site was very busy. Plus, it's kinda extreme and also causes replication problems. So I tossed this idea quickly.
I also thought about just storing the original posts and then serializing the comments in a binary form, since who cares about individual comments. This would be very fast, however if a user wants to delete their comment or append a new comment to the end, I have to deserialize this data, modify the structure, then serialize it back and update the row. If a bunch of people are commenting on the same post at the same time, I might have random issues with that.
So here's what I eventually did. I query for all the posts ordered by date entered. In the middle ware layer, I loop through the recordset and create a "stack" of original posts, each node on the stack points to a linked list of comments. When I come across an original post, I push a new node on the stack and when I come across a comment I add a node to the linked list. I organize this in memory so I can traverse the recordset once and have O(n). After I create the in-memory representation of the wall, I traverse through this data structure again and write out HTML. This works great and has super fast inserts and super fast selects, and no weird row locking issues; however it's a bit heavier on my presentation layer and requires me to build an in memory representation of the user's wall to move stuff around so it's in the right order. Still, I believe this is the best approach I've found so far.
I thought I'd check with other SQL experts to see if there's a better way to do this using some weird JOINS or UNIONS or something which would still be performant with millions of users.
Upvotes: 1
Views: 625
Reputation: 365
If we stick to your table design … I think you would need some special value in the parentid column to separate original posts from comments (maybe just NULL, if you change definition of that column to nullable). Then, self-join will work. Something like this:
SELECT posts.comment as [Original Post],
comments.comment as Comment
FROM wallposts AS posts
LEFT OUTER JOIN wallposts AS comments
ON posts.id=comments.parentID
WHERE posts.parentID IS NULL
ORDER BY posts.posted, comments.posted
The result set shows Original Post before every comment, and has the right order.
(This was done using SQL Server, so I'm not sure if it works in your environment.)
Upvotes: 0
Reputation: 7501
You should look into "nested sets". They allow retrieving a hierarchy very easily with a single query. Here's an article about them
If you are using SQL server 2008, it has built-in support for it through the "hierarchyID" type.
Inserts and updates are more costly and complicated if you don't have the built in support), but querying is much faster and easier.
EDIT: Damn, missed the part where you already knew about it. (was checking from a mobile phone).
Upvotes: 0
Reputation: 28402
Do you want people to be able to comment on other comments, i.e. does the tree have infinite depth?
If you just want to have posts and then comments on those posts then you were on the right lines to start with and I believe the following SQL would meet that requirement (Untested so may be typos)
SELECT posts.id,
posts.posted AS posted_at,
posts.userid AS posted_by,
posts.posterid,
posts.comment AS post_text,
comments.posted AS commented_at,
comments.userid AS commented_by,
comments.comment AS comment_text
FROM wallposts AS posts
LEFT OUTER JOIN wallposts AS comments ON comments.parent_id = posts.id
ORDER BY posts.posted, comments.posted
This technique, a self-join, simply joins the table to itself using table aliases to specify the joins.
Upvotes: 0
Reputation: 88385
I think you're better off using a simpler model with a "ParentID" on Comment to allow for nesting comments. I don't think it's usually a good practice to use datetimes as keys, especially in this case, where you don't really need to, and an identity ID will be sufficient. Here's a basic example that might work:
Post
----
ID (PK)
Timestamp
UserID (FK)
Text
Comment
-------
ID (PK)
Timestamp
PostID (FK)
ParentCommentID (FK nullable) -- allows for nested comments
Text
Upvotes: 1