Reputation: 902
I have created a calendar table which simply has a load of dates in it. My events table then has dates that line up and if there are no events for a day, I want to return a zero for that. I have the following:
SELECT cDate, Branch, IFNULL(COUNT(*),0) as count
FROM Events E LEFT JOIN Calendar C ON C.cDate = DATE(E.eventDate)
WHERE cDate BETWEEN '2018-04-14' AND '2018-04-18'
GROUP BY Branch, cDate
ORDER BY cDate
However results are currently showing:
cDate | Branch | count
2018-04-14 | 1 | 5
2018-04-14 | 2 | 4
2018-04-16 | 1 | 1
2018-04-16 | 2 | 3
2018-04-17 | 1 | 5
2018-04-18 | 1 | 4
However I intend for it to display any dates with a count of zero, like this:
cDate | Branch | count
2018-04-14 | 1 | 5
2018-04-14 | 2 | 4
2018-04-15 | 1 | 0
2018-04-15 | 2 | 0
2018-04-16 | 1 | 1
2018-04-16 | 2 | 3
2018-04-17 | 1 | 5
2018-04-17 | 2 | 0
2018-04-18 | 1 | 4
2018-04-18 | 2 | 0
Upvotes: 1
Views: 78
Reputation: 108400
Any condition(s) in the WHERE
clause that require column from the outer joined table to be non-NULL effectively "negates" the outer-ness of the join, rendering it equivalent to an inner join.
This condition
cdate BETWEEN '2018-04-14' AND '2018-04-18'
will only be satisfied by rows that have non-NULL values of cdate
.
It helps (me) to think about the left outer join operation in this way:
when a row from the left side has no matching rows from the right side, a dummy row is invented on the right side to serve as a matching row. (The join needs that matching row so the row can be returned.) The generated/invented dummy row consists entirely of NULL
values.
So a partial fix for the behavior your are observing would be to relocate that condition from the WHERE
clause into the ON
clause of the outer join.
That change may be all that is needed to fix the problem, but... I hesitate to specifically recommend that as a solution, because I don't have an understanding of the actual specification.
Another suggestion:
As an aid future readers, consider qualifying all column references. (We notice the SQL statement is already assigning aliases to the tables.)
From the information posted in the question, we can't determine which table the branch
column is from. It looks like Calendar
might simply be a list of unique dates, so we're going to assume that the branch
column is found in the Event
table.
I suspect that the desired result would be returned by a query like this:
SELECT c.cdate
, b.branch
, COUNT(e.branch) AS `count`
FROM Calendar c
CROSS
JOIN Branch b
LEFT
JOIN Events e
ON e.eventdate >= c.cdate
AND e.eventdate < c.cdate + INTERVAL 1 DAY
AND e.branch = b.branch
WHERE c.cdate BETWEEN '2018-04-14' AND '2018-04-18'
GROUP
BY c.cdate
, b.branch
ORDER
BY c.cdate
, b.branch
Let's unpack that a bit.
We're getting all the dates in the specified range from Calendar
. (We suspect/assume that cdate
is DATE datatype, and is guaranteed to be unique. And in this query, we're basically using Calendar
to generate a contiguous set of date values.)
And we're wanting to get a "count" of the number of Events
related to each particular date from Calendar
.
Note that the COUNT()
aggregate is going to return a non-NULL value; if we are counting an expression that evaluates to NULL, the count won't be incremented. We don't need to wrap the COUNT()
aggregate in an IFNULL/COALESCE/CASE to replace a NULL with a zero..
We're doing a "left join". That means we want the driving table (Calendar
in this case) to be on the left side, and we want the table we're finding matches from to be on the right side. If a matching row is not found on the right side, a dummy row consisting of all NULL values will be "generated", so a joined row can be returned.
Since we want to get counts by "cdate
" and by "branch
", we also need a row source for the "branch
" values. (As @Shadow notes, we could use a table in place of the inline view b
. The purpose of the inline view b
is to get a distinct list of branch
values we want returned.)
The CROSS JOIN
will get us a cross product. That is, all cdate
values matched with all branch
values, so we have a complete set. Five cdate
values, two branch
values, gets us a set of 10 rows, the rows we want to return. We need these rows to be able to return a count of "zero" when there are no matching Event
rows for a given cdate
and branch
.
Again, we're assuming that cdate
is unique in Calendar
, and therefore we're returning (at most) five rows from Calendar
. We're also assuming that there will be (potentially) many more rows from Event
will need to be examined, to compare the eventdate
to cdate
. We don't want to prevent MySQL from making effective use of an index range operation on the eventdate
column (with a suitable index available), so we avoid wrapping eventdate
column in a function and reference the bare column instead.
We're just guessing at the requirements, so my suggestion may not satisfy the actual specification.
FOLLOWUP
We need a rowsource for the branch
values. That can be a table, or an inline view query. Original SQL didn't assume a Branch
table, so we used a query to get a distinct list of branches:
JOIN ( SELECT br.branch
FROM Events br
GROUP BY br.branch
) b
The inline view query in my original answer serves the same purpose as the Branch
table in the revised query. It returns a distinct list of branch
values that occur in the Events
table. If an index with branch
as the leading column is available, MySQL can make use of the index.
The big difference would be for a branch
value (e.g. 3) that appears in the Branch
table, but does not appear in the Event
table. With the inline view of Event
, we wouldn't return any rows for branch
= 3.
Upvotes: 2
Reputation: 1460
I would approach this by using a cross join to link up the required dates in the calendar. And then join that with the Event table to get the counts of eventDate.
SELECT c.cDate, b.Branch, COUNT(e.EventDate) as count
FROM
(SELECT *
FROM Calendar C WHERE
cDate BETWEEN '2018-04-14' AND '2018-04-18' ) c
CROSS JOIN
(SELECT distinct branch from Events ) b
LEFT JOIN
events e
ON c.cDate = DATE(e.EventDate) AND e.branch = b.branch
GROUP BY c.cDate, b.Branch
ORDER BY c.cDate, b.Branch
Upvotes: 0
Reputation: 34231
There are 2 issues:
The calendar table has the complete list of dates, so it should be on the left hand side of the left join.
You do not simply need the complete list of dates, you need the complete list of date - branch combinations.
I assume you have a branches table to store the complete list of branches. I cross join this on the calendar table and then left join the results on the actual events table:
select c.cdate, b.branch, count(e.eventdate)
from (branches b join calendar c)
left join events e on b.branch=e.branch and c.cdate=date(e.eventdate)
group by c.cdate, b.branch
Upvotes: 0