php_nub_qq
php_nub_qq

Reputation: 16017

Count rows from different tables in join statement

I have 2 tables - comments and ratings. The comments table contains a column reply which indicates whether a comment is a reply to another comment. The ratings table contains ratings for the comments in the form of comment_id, user_id, rating

When I am selecting comments to display it is a bit complex so I'll try to simplify as much as I can

SELECT
COALESCE(SUM(cr.vote), 0) AS rating,
COUNT(r.id) AS replies

FROM comments c 
LEFT JOIN comments_ratings cr ON c.id = cr.comment
LEFT JOIN comments r ON c.id = r.reply

WHERE c.id = 1

GROUP BY c.id;

Here is the testing setup

CREATE TABLE `comments` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `text` text NOT NULL,
  `author` int(10) unsigned NOT NULL,
  `time` datetime DEFAULT NULL,
  `reply` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `reply` (`reply`),
  CONSTRAINT `comments_ibfk_1` FOREIGN KEY (`reply`) REFERENCES `comments` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

CREATE TABLE `comments_ratings` (
  `comment` int(10) unsigned NOT NULL,
  `user` int(10) unsigned NOT NULL,
  `vote` tinyint(4) NOT NULL,
  PRIMARY KEY (`comment`,`user`),
  KEY `user` (`user`),
  CONSTRAINT `comments_ratings_ibfk_1` FOREIGN KEY (`comment`) REFERENCES `comments` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
  -- CONSTRAINT `comments_ratings_ibfk_2` FOREIGN KEY (`user`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8

INSERT INTO comments (id, reply, text, author) VALUES (1, null, '', 0), (null, 1, '', 0),(null, 1, '', 0),(null, 1, '', 0);
INSERT INTO comments_ratings (comment, user, vote) VALUES (1, 1, 1);

Now if you execute the select statement you will see rating becomes 3 even though there is only 1 record in comments_ratings with value 1. If I add another reply it will become 4. If you add another comments_ratings record with value 1 it will double and become 8. That is because each row from the joins is copying information in fields it doesn't have.

Can you help me set up the join on r so that it doesn't double the rating and replies.

Upvotes: 0

Views: 86

Answers (3)

php_nub_qq
php_nub_qq

Reputation: 16017

Update: Even though both answers were correct, I have currently been testing this set up with large volumes of data and the performance is next to awful. After a short investigation I determined the reason - basically the suggested solutions create a temporary table in memory in which all data from the table is filled on each query, as the amount of data increases this time also increases and on a fairly weak server that I'm running I get a query time of above 5 seconds for a couple thousand rows.

I have come up with a solution for that problem, it still uses temporary tables but instead of copying the entire table into the temp one it only copies the range of records that are being selected, here it is:

SELECT
    c.*,
  COUNT(r.id) AS replies
FROM
    (
        SELECT
            c.id,
            c.text,
            c.time,
            c.author AS author_id,
            SUM(cr.vote) AS rating,
            crv.vote AS voted
        FROM
            comments c
        LEFT JOIN users u ON u.id = c.author
        LEFT JOIN comments_ratings cr ON cr. COMMENT = c.id
        LEFT JOIN comments_ratings crv ON crv. COMMENT = c.id
        AND crv. USER = ?
        WHERE
            c.item = ?
        AND c.type = ?
        AND c.id < ?
        GROUP BY
            c.id
        ORDER BY
            c.id DESC
        LIMIT 0,
        100
    ) AS c
LEFT JOIN comments r ON c.id = r.reply
GROUP BY
    c.id
ORDER BY
    c.id DESC

I tested this method with 4+ million records in the table and the queries were executed in less than 10 milliseconds on a pretty weak server machine.

Upvotes: 0

shA.t
shA.t

Reputation: 16958

When you have some LEFT JOINs from some sub-tables to one super-table you should remember that rows of your super-table will be repeated by both of sub-tables, So you should change your query to something like this:

SELECT
    COALESCE(SUM(cr.vote), 0) AS rating,
    COALESCE(SUM(r.cnt), 0) AS replies
FROM 
    comments c 
LEFT JOIN 
    (SELECT
        cri.comment,
        SUM(cri.vote) As vote
     FROM
        comments_ratings cri
     GROUP BY
        cri.comment
    )cr ON c.id = cr.comment
LEFT JOIN 
    (SELECT  
        ci.reply,
        COUNT(ci.id) cnt
     FROM 
        comments ci
     GROUP BY
        ci.reply
    ) AS r ON c.id = r.reply
WHERE 
    c.id = 1
GROUP BY 
    c.id;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269503

One method is to pre-aggregate the data before the join. Something like this:

FROM comments c LEFT JOIN
     (SELECT cr.comment, SUM(cr.vote) as vote
      FROM comments_ratings cr
      GROUP BY cr.comment
     ) cr
     ON c.id = cr.comment LEFT JOIN
     comments r
     ON c.id = r.reply

You might also want to include filtering conditions in the subquery, for performance reasons.

Upvotes: 1

Related Questions