ylnor
ylnor

Reputation: 4869

Include NULL in SQL Join when using WHERE

I have the following two tables:

+------+-----------+
| Id   | Followers |
+------+-----------+
|  874 |     55542 |
| 1081 |    330624 |
| 1378 |     17919 |
| 1621 |       920 |
| 1688 |    255463 |
| 2953 |       751 |
| 3382 |    204466 |
| 3840 |    273489 |
| 4145 |       376 |
| ...  |       ... |
+------+-----------+
+---------------------+---------+--------------------+
| Date                | User_Id | has_original_tweet |
+---------------------+---------+--------------------+
| 2008-02-22 12:00:00 |  676493 |                  2 |
| 2008-02-22 12:00:00 |  815263 |                  1 |
| 2008-02-22 12:00:00 | 6245822 |                  1 |
| 2008-02-22 12:00:00 | 8854092 |                  1 |
| 2008-02-23 12:00:00 |  676493 |                  2 |
| 2008-02-23 12:00:00 |  815263 |                  1 |
| 2008-02-23 12:00:00 | 6245822 |                  1 |
| 2008-02-23 12:00:00 | 8854092 |                  1 |
| 2008-02-24 12:00:00 |  676493 |                  2 |
|    .............    |   ...   |                 .. |
+---------------------+---------+--------------------+

I am running the following join query on these tables:

SELECT sum(has_original_tweet), b.Id
  FROM temporal_users AS a
  RIGHT JOIN TempUser22 AS b
  ON a.User_ID = b.Id
  GROUP BY b.Id;

Which returns 57,00 rows as expected, with NULL answers on the first field:

+-------------------------+------+
| sum(has_original_tweet) | Id   |
+-------------------------+------+
|                    NULL |  874 |
|                    NULL | 1081 |
|                     135 | 1378 |
|                     164 | 1621 |
|                     652 | 1688 |
|                     691 | 2953 |
|                    NULL | 3382 |
|                    NULL | 3840 |
|                    NULL | 4145 |
|                     ... | .... |
+-------------------------+------+

However, when adding the WHERE line specifying a date as below:

SELECT sum(has_original_tweet), b.Id
  FROM temporal_users AS a
  RIGHT JOIN TempUser22 AS b
  ON a.User_ID = b.Id
  WHERE a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
  GROUP BY b.Id;

I receive the following answer, of only 3200 rows, and without any NULL in the first field.

+-------------------------+---------+
| sum(has_original_tweet) | Id      |
+-------------------------+---------+
|                       1 |  797194 |
|                       1 |  815263 |
|                       0 |  820678 |
|                       1 | 1427511 |
|                       0 | 4653731 |
|                       1 | 5933862 |
|                       2 | 7530552 |
|                       1 | 7674072 |
|                       1 | 8149632 |
|                      .. |   ....  |
+-------------------------+---------+

My question is: How to get, for a given date, an answer of size 57,000 rows for each user in TempUser22 with NULL values when has_original_tweet is not present in temporal_user for the given date?

Thanks.

Upvotes: 1

Views: 1713

Answers (2)

Lexx918
Lexx918

Reputation: 222

SELECT b.Id, SUM(a.has_original_tweet) s
FROM TempUser22 b
LEFT JOIN temporal_users a ON b.Id = a.User_Id
  AND a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
GROUP BY b.Id;

Id  s
1   null
2   1
3   null
4   3
5   null
6   null

For debugging, I used:

CREATE TEMPORARY TABLE TempUser22(Id INT, Followers INT)
  SELECT 1 Id, 10 Followers UNION ALL
  SELECT 2, 20 UNION ALL
  SELECT 3, 30 UNION ALL
  SELECT 4, 40 UNION ALL
  SELECT 5, 50 UNION ALL
  SELECT 6, 60
;
CREATE TEMPORARY TABLE temporal_users(`Date` DATETIME, User_Id INT, has_original_tweet INT)
  SELECT '2008-02-22 12:00:00' `Date`, 1 User_Id, 1 has_original_tweet UNION ALL

  SELECT '2008-12-31 12:00:00', 2, 1 UNION ALL
  SELECT '2010-12-31 12:00:00', 2, 1 UNION ALL
  SELECT '2012-12-31 12:00:00', 2, 1 UNION ALL

  SELECT '2008-12-31 12:00:00', 4, 9 UNION ALL
  SELECT '2010-12-31 12:00:00', 4, 1 UNION ALL
  SELECT '2010-12-31 12:00:00', 4, 2 UNION ALL
  SELECT '2012-12-31 12:00:00', 4, 9
;

Upvotes: 4

Stefano Zanini
Stefano Zanini

Reputation: 5916

That's because NULL values will always be discarded from the where clause

You can use a coalesce in your where clause.

WHERE coalesce(a.Date, 'some-date-in-the-range') BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'

With this instead, you force null values to be considered as valid.

Upvotes: 1

Related Questions