Hzmy
Hzmy

Reputation: 779

MySQL Double count on left join

Database Structure

CREATE TABLE installs(
    id INT, PRIMARY KEY(id), 
    created DATETIME)

CREATE TABLE uninstalls(
    id INT, PRIMARY KEY(id),
    created DATETIME,
    install_id INT)

The Query ("Me vs. The MySQL")

SELECT DATE(installs.created),
  COUNT(installs.id),
  COUNT(uninstall.id)
FROM installs
LEFT JOIN uninstalls ON uninstalls.install_id = installs.id
GROUP BY DATE(installs.created)

The "Expected" Output

DATE(installs.created) | COUNT(installs.id) | COUNT(uninstalls.id) 
  2012-11-20           | *installs on date* | *uninstalls on date*

So - I am looking a row per day, with the number of installs/uninstalls that happened on that day.

The Problem

The data for the 'installs' is correct for each day. BUT the data for the 'uninstalls' for each day is sadly incorrect.

Upvotes: 7

Views: 3696

Answers (3)

Terje D.
Terje D.

Reputation: 6315

Count the installs and the uninstall separately, adding a column (of zeroes) for the other count to each of them. Then combine the two with UNION, group by date once again and take the max for each date (to eliminate the added zeroes):

SELECT created as date, max(installs) as installs, max(uninstalls) as uninstalls
FROM
  (SELECT created, count(*) AS installs, 0 AS uninstalls
   FROM installs
   GROUP BY created
  UNION ALL
   SELECT created, 0 AS installs, count(*) AS uninstalls
   FROM uninstalls
   GROUP BY created) c
GROUP BY created
ORDER BY created

Upvotes: 4

Nick Vaccaro
Nick Vaccaro

Reputation: 5504

Create a CTE that includes all dates for installs and uninstalls, then join to install and uninstall table separately.

; WITH Dates AS
(
    SELECT DATE(created) AS created
    FROM
    (
        SELECT created
        FROM installs

            UNION ALL

        SELECT created
        FROM uninstalls
    ) a
    GROUP BY DATE(created)
)
, InstallCTE AS
(
    SELECT DATE(created) AS created
        , COUNT(*) AS c
    FROM installs
    GROUP BY DATE(created)
)
, UninstallCTE AS
(
    SELECT DATE(created) AS created
        , COUNT(*) AS c
    FROM uninstalls
    GROUP BY DATE(created)
)
SELECT d.created AS [date]
    , i.c AS install_count
    , u.c AS uninstall_count
FROM Dates d
LEFT JOIN InstallCTE i
    ON d.created = i.created
LEFT JOIN UninstallCTE u
    ON d.created = u.created

Upvotes: 0

Jeremy Smyth
Jeremy Smyth

Reputation: 23503

This problem is happening because you're linking the installs and uninstalls. That is, you're showing the installs on a given date, and the uninstalls for those installs, not for the date of the installs.

If you want to see the count of each event on a given date, you'll need to join on the date, not the install ID. If it's a large data set, it's going to be very slow (the optimizer can't use indexes on a field when you group by or join based on a function on that field).

SELECT DATE(installs.created),
  COUNT(installs.id),
  COUNT(uninstall.id)
FROM installs 
LEFT JOIN uninstalls ON DATE(uninstalls.created) = DATE(installs.created)
GROUP BY DATE(installs.created)

Upvotes: 0

Related Questions