BN83
BN83

Reputation: 902

MySQL query with LEFT JOIN not returning empty results

I have created a calendar table which simply has a load of dates in it. My events table then has dates that line up and if there are no events for a day, I want to return a zero for that. I have the following:

SELECT cDate, Branch, IFNULL(COUNT(*),0) as count
FROM Events E LEFT JOIN Calendar C ON C.cDate = DATE(E.eventDate)
WHERE cDate BETWEEN '2018-04-14' AND '2018-04-18'
GROUP BY Branch, cDate
ORDER BY cDate

However results are currently showing:

  cDate    | Branch | count
2018-04-14 |   1    |  5
2018-04-14 |   2    |  4
2018-04-16 |   1    |  1
2018-04-16 |   2    |  3
2018-04-17 |   1    |  5
2018-04-18 |   1    |  4

However I intend for it to display any dates with a count of zero, like this:

  cDate    | Branch | count
2018-04-14 |   1    |  5
2018-04-14 |   2    |  4
2018-04-15 |   1    |  0
2018-04-15 |   2    |  0
2018-04-16 |   1    |  1
2018-04-16 |   2    |  3
2018-04-17 |   1    |  5
2018-04-17 |   2    |  0
2018-04-18 |   1    |  4
2018-04-18 |   2    |  0

Upvotes: 1

Views: 78

Answers (3)

spencer7593
spencer7593

Reputation: 108400

Any condition(s) in the WHERE clause that require column from the outer joined table to be non-NULL effectively "negates" the outer-ness of the join, rendering it equivalent to an inner join.

This condition

   cdate BETWEEN '2018-04-14' AND '2018-04-18'

will only be satisfied by rows that have non-NULL values of cdate.

It helps (me) to think about the left outer join operation in this way:

when a row from the left side has no matching rows from the right side, a dummy row is invented on the right side to serve as a matching row. (The join needs that matching row so the row can be returned.) The generated/invented dummy row consists entirely of NULL values.

So a partial fix for the behavior your are observing would be to relocate that condition from the WHERE clause into the ON clause of the outer join.

That change may be all that is needed to fix the problem, but... I hesitate to specifically recommend that as a solution, because I don't have an understanding of the actual specification.


Another suggestion:

As an aid future readers, consider qualifying all column references. (We notice the SQL statement is already assigning aliases to the tables.)

From the information posted in the question, we can't determine which table the branch column is from. It looks like Calendar might simply be a list of unique dates, so we're going to assume that the branch column is found in the Event table.


I suspect that the desired result would be returned by a query like this:

SELECT c.cdate
     , b.branch
     , COUNT(e.branch)  AS `count`
  FROM Calendar c
 CROSS
  JOIN Branch b 

  LEFT
  JOIN Events e
    ON e.eventdate  >= c.cdate
   AND e.eventdate   < c.cdate + INTERVAL 1 DAY
   AND e.branch      = b.branch

 WHERE c.cdate BETWEEN '2018-04-14' AND '2018-04-18'
 GROUP
    BY c.cdate
     , b.branch
 ORDER
    BY c.cdate
     , b.branch

Let's unpack that a bit.

We're getting all the dates in the specified range from Calendar. (We suspect/assume that cdate is DATE datatype, and is guaranteed to be unique. And in this query, we're basically using Calendar to generate a contiguous set of date values.)

And we're wanting to get a "count" of the number of Events related to each particular date from Calendar.

Note that the COUNT() aggregate is going to return a non-NULL value; if we are counting an expression that evaluates to NULL, the count won't be incremented. We don't need to wrap the COUNT() aggregate in an IFNULL/COALESCE/CASE to replace a NULL with a zero..

We're doing a "left join". That means we want the driving table (Calendar in this case) to be on the left side, and we want the table we're finding matches from to be on the right side. If a matching row is not found on the right side, a dummy row consisting of all NULL values will be "generated", so a joined row can be returned.

Since we want to get counts by "cdate" and by "branch", we also need a row source for the "branch" values. (As @Shadow notes, we could use a table in place of the inline view b. The purpose of the inline view b is to get a distinct list of branch values we want returned.)

The CROSS JOIN will get us a cross product. That is, all cdate values matched with all branch values, so we have a complete set. Five cdate values, two branch values, gets us a set of 10 rows, the rows we want to return. We need these rows to be able to return a count of "zero" when there are no matching Event rows for a given cdate and branch.

Again, we're assuming that cdate is unique in Calendar, and therefore we're returning (at most) five rows from Calendar. We're also assuming that there will be (potentially) many more rows from Event will need to be examined, to compare the eventdate to cdate. We don't want to prevent MySQL from making effective use of an index range operation on the eventdate column (with a suitable index available), so we avoid wrapping eventdate column in a function and reference the bare column instead.


We're just guessing at the requirements, so my suggestion may not satisfy the actual specification.


FOLLOWUP

We need a rowsource for the branch values. That can be a table, or an inline view query. Original SQL didn't assume a Branch table, so we used a query to get a distinct list of branches:

  JOIN  ( SELECT br.branch
           FROM Events br
          GROUP BY br.branch
       ) b

The inline view query in my original answer serves the same purpose as the Branch table in the revised query. It returns a distinct list of branch values that occur in the Events table. If an index with branch as the leading column is available, MySQL can make use of the index.

The big difference would be for a branch value (e.g. 3) that appears in the Branch table, but does not appear in the Event table. With the inline view of Event, we wouldn't return any rows for branch = 3.

Upvotes: 2

kc2018
kc2018

Reputation: 1460

I would approach this by using a cross join to link up the required dates in the calendar. And then join that with the Event table to get the counts of eventDate.

SELECT c.cDate, b.Branch, COUNT(e.EventDate) as count
FROM
(SELECT *
FROM Calendar C WHERE
cDate BETWEEN '2018-04-14' AND '2018-04-18' ) c
CROSS JOIN 
(SELECT distinct branch from Events ) b
LEFT JOIN
events e
ON c.cDate = DATE(e.EventDate) AND e.branch = b.branch
GROUP BY c.cDate, b.Branch
ORDER BY c.cDate, b.Branch

Upvotes: 0

Shadow
Shadow

Reputation: 34231

There are 2 issues:

  1. The calendar table has the complete list of dates, so it should be on the left hand side of the left join.

  2. You do not simply need the complete list of dates, you need the complete list of date - branch combinations.

I assume you have a branches table to store the complete list of branches. I cross join this on the calendar table and then left join the results on the actual events table:

select c.cdate, b.branch, count(e.eventdate)
from (branches b join calendar c)
left join events e on b.branch=e.branch and c.cdate=date(e.eventdate)
group by c.cdate, b.branch

Upvotes: 0

Related Questions