Codist
Codist

Reputation: 1208

A SQL Join returns duplicate data and messes with SQL query

I've been regressively testing my bulletin board I recently developed. I came across a small bug in my SQL query.

SELECT `user`.avatar, `user`.username, `user`.id AS user_id, `post`.title, 
       `post`.id, `post`.date AS post_date, `comment`.date AS last_comment_date
FROM `post`
INNER JOIN `user` ON `post`.user_id = `user`.id
LEFT JOIN `comment` ON `post`.id = `comment`.post_id
WHERE `post`.category_id = 1
ORDER BY IFNULL(`last_comment_date`, `post_date`) DESC;

Will produce something like;

duplicate data

I happen to know the cause of this. MySQL joins all the comments with their post, in other words, its a one to many relationship, a post can have many comments. This creates issues with pagination as the first ten posts could be the same depending on whether or not it has ten comments. I happen to only need the latest comment date from the comment table.

I came across a couple of solutions.

What is the best thing to do? Is there anyway to stop MySQL from returning duplicate data on a join if it is a one to many relationship?

EDIT 1

As @Zane asked, this SQL query...

SELECT `user`.avatar, `user`.username, `user`.id AS user_id, `post`.title, `post`.id, 
`post`.date AS post_date, MAX(`comment`.date) 
FROM `post` 
INNER JOIN `user` ON `post`.user_id = `user`.id 
LEFT JOIN `comment` ON `post`.id = `comment`.post_id 
WHERE `post`.category_id = 1 
GROUP BY `post`.id 
ORDER BY IFNULL(MAX(`comment`.date), `post_date`) DESC

produces;

non-duplicate data

Which seems like exactly what I want. The posts are being sorted by the latest comment date if they have comments (or post date) if they don't. I can use this with pagination LIMITS as well without any preprocessing. Do I still need IFNULL? I'd think so.

EDiT 2

There is still a defect. If a post has a NULL last_comment_date it doesn't sort correctly. If you refer to the the image below. You can see that the last_comment_date is sitting there even though a bump should've occurred.

doesn't sort properly

Please ignore what's below.


EDIT 3

This is what I want.

what I need

Upvotes: 0

Views: 177

Answers (1)

timo.rieber
timo.rieber

Reputation: 3867

I think the GROUP BY is your best choice. Simply group by all selected columns except comment.date and use MAX-function to get the latest comment date like this:

SELECT
    `user`.avatar,
    `user`.username,
    `user`.id AS user_id,
    `post`.title,
    `post`.id,
    `post`.date AS post_date,
    MAX(`comment`.date) AS last_comment_date

FROM
    `post` INNER JOIN `user` ON `post`.user_id = `user`.id
    LEFT JOIN `comment` ON `post`.id = `comment`.post_id

WHERE
    `post`.category_id = 1

GROUP BY
    `user`.avatar,
    `user`.username,
    `user`.id,
    `post`.title,
    `post`.id,
    `post`.date

ORDER BY
    IFNULL(`last_comment_date`, `post_date`) DESC;

Upvotes: 2

Related Questions