Morysh
Morysh

Reputation: 71

Should I duplicate data or use complex queries

I'm trying to make a forum from scratch, mainly for practice and maybe reuse it later.
I'm currently working on the database, and there's one thing that bugs me. I have a table Thread which must contain a reference the the first post, the author, etc... But I'm not sure how to do it properly.
(1) My first thought was "let's make the less dupplicates possible", so I thought I would have a basic Thread table

CREATE TABLE Thread(
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    title VARCHAR(40) NOT NULL,
    category VARCHAR(40) NOT NULL,
);

and a table Post

CREATE TABLE Post(
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    author VARCHAR(40) NOT NULL,
    thread INT NOT NULL,
    creationDate DATETIME NOT NULL,
    content TEXT NOT NULL,
    PRIMARY KEY (id),
    CONSTRAINT fk_Post-author_UserProfile-name FOREIGN KEY (author)
    REFERENCES UserProfile(name) ON UPDATE CASCADE,
    CONSTRAINT fk_Post-thread_Thread-id FOREIGN KEY (thread) 
    REFERENCES Thread(id)
);

so that when I want to know the first post of a thread, I retrieve all the posts from that thread sorted by date and take the first one. EZ money ! But then I thought, if I want to get just a thread, its author and creation date (which seems pretty legit to me) knowing its title, that's gonna be a huge query with joins and might be very slow and cumbersome.

SELECT Thread.title, posts.creator, posts.creationDate
FROM Thread
JOIN (
    SELECT Post.thread as thread, Post.creationDate, UserProfile.name AS creator
    FROM Post
    JOIN UserProfile
        ON Post.author = UserProfile.id
) AS posts
    ON Thread.id = posts.thread
WHERE Thread.title = 'Boy, that''s one hell of a query you''ve got here !'
ORDER BY creationDate
LIMIT 1;

(I'm not sure about this query, I'm a newbie)

(2) "Alright then ! Just add a 'firstPost' column to the Thread table" you'll tell me, but then why not also duplicate the author reference ? Where does this stops ?
My guess will be that option (1) is too heavy, and maybe and option (2) with just a reference to the first post might be a better option, but not duplicating the author reference directly onto Thread table. Which should give something like the following if I'm not mistaking, but adds a column.

SELECT Thread.title, UserProfile.name, Post.creationDate
FROM Thread
JOIN Post
    ON Thread.firstPost = Post.id
JOIN UserProfile
    ON Post.author = UserProfile.id
WHERE Thread.title = 'Boy, that''s one hell of a query you''ve got here !';

What do you think about it, keeping in mind that as a didactic project I don't just want it to work, but also to be "clean" ?

Upvotes: 0

Views: 290

Answers (2)

Thorsten Kettner
Thorsten Kettner

Reputation: 95101

I understand your thinking. Yes, a thread consists of a series of posts, but the first post is different from the others, as this is what the thread is about, whereas the other posts are mere replies to that first post.

From that point of view we could argue that not only the posts belong to a thread (and so a post table must contain the thread ID), but also the thread is about a (first) post and should hence contain the post ID. This is a possible model, but resulting in a hen and egg problem: each post must belong to an existing thread and each thread must refer to an existing post. One could solve this by writing a thread without a post reference, then write the first post, then update the thread. Which would work, but is not an elegant solution. Another option would be to write both records in one transaction (the thread already containing a post ID; the post containing the thread ID) and have the constraints only work on commit. This is called deferred constraints and only available in few DBMS. I am pretty sure, MySQL doesn't support these.

But of course you could keep it simple, saying one post is as good as the other in the datamodel and one happens to be the first. This is what you have already and it is not wrong. It is a valid model. I would, however, suggest to use a model with a coumpound key and a post number rather than an ID:

  • thread (thread_id, title, category_id)
  • post (thread_id, post_num, author_id, creation_time, content)

where the primary key is bold. A first post would have post_num 1 of course, the second #2 and so on. This would make it super simple to retrieve the threads' first posts.

But there is another thing that comes to mind. In many forums the threads don't consist of a chain of posts, but of a tree of posts. That means: except for the first one, a post always refers to another post it is replying to. That could lead to the following model:

  • thread (thread_id, title, category_id)
  • post (thread_id, post_num, author_id, creation_time, content, parent_post_num)

So a record in the post table can refer to another record in the post table (the post it replies to). Here the compound key brings another advantage: Both post_num and parant_post_num have only relevance in combination with thread_id. So the constraint FOREIGN KEY (thread_id, parent_post_num) REFERENCES (thread_id, post_num) would work great, as a post could not have a parent post in a foreign thread (which it could with non-compound keys).

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1271231

Your initial instinct is correct -- don't duplicate the data, unless you really need to. Your queries really not complicated. And, maintaining duplicated data is quite complicated.

Your first query should be:

SELECT t.title, up.name as creator, p.creationDate
FROM Thread t JOIN 
     Post p
     ON p.thread = t.id JOIN
     UserProfile up
     ON p.author = up.id
WHERE t.title = 'Boy, that''s one hell of a query you''ve got here !'
ORDER BY p.creationDate
LIMIT 1;

Notes:

  • In MySQL, don't use subqueries in the FROM clause unless you need to. And you do't need to.
  • Table aliases make the query easier to write and to read.
  • Also, if you are learning SQL, I would advise you to give your primary keys and foreign keys the same name. So, threadId instead of id and thread.

Upvotes: 1

Related Questions