Reputation: 4869
I have the following two tables:
+------+-----------+ | Id | Followers | +------+-----------+ | 874 | 55542 | | 1081 | 330624 | | 1378 | 17919 | | 1621 | 920 | | 1688 | 255463 | | 2953 | 751 | | 3382 | 204466 | | 3840 | 273489 | | 4145 | 376 | | ... | ... | +------+-----------+
+---------------------+---------+--------------------+ | Date | User_Id | has_original_tweet | +---------------------+---------+--------------------+ | 2008-02-22 12:00:00 | 676493 | 2 | | 2008-02-22 12:00:00 | 815263 | 1 | | 2008-02-22 12:00:00 | 6245822 | 1 | | 2008-02-22 12:00:00 | 8854092 | 1 | | 2008-02-23 12:00:00 | 676493 | 2 | | 2008-02-23 12:00:00 | 815263 | 1 | | 2008-02-23 12:00:00 | 6245822 | 1 | | 2008-02-23 12:00:00 | 8854092 | 1 | | 2008-02-24 12:00:00 | 676493 | 2 | | ............. | ... | .. | +---------------------+---------+--------------------+
I am running the following join query on these tables:
SELECT sum(has_original_tweet), b.Id
FROM temporal_users AS a
RIGHT JOIN TempUser22 AS b
ON a.User_ID = b.Id
GROUP BY b.Id;
Which returns 57,00 rows as expected, with NULL answers on the first field:
+-------------------------+------+ | sum(has_original_tweet) | Id | +-------------------------+------+ | NULL | 874 | | NULL | 1081 | | 135 | 1378 | | 164 | 1621 | | 652 | 1688 | | 691 | 2953 | | NULL | 3382 | | NULL | 3840 | | NULL | 4145 | | ... | .... | +-------------------------+------+
However, when adding the WHERE line specifying a date as below:
SELECT sum(has_original_tweet), b.Id
FROM temporal_users AS a
RIGHT JOIN TempUser22 AS b
ON a.User_ID = b.Id
WHERE a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
GROUP BY b.Id;
I receive the following answer, of only 3200 rows, and without any NULL in the first field.
+-------------------------+---------+ | sum(has_original_tweet) | Id | +-------------------------+---------+ | 1 | 797194 | | 1 | 815263 | | 0 | 820678 | | 1 | 1427511 | | 0 | 4653731 | | 1 | 5933862 | | 2 | 7530552 | | 1 | 7674072 | | 1 | 8149632 | | .. | .... | +-------------------------+---------+
My question is: How to get, for a given date, an answer of size 57,000 rows for each user in TempUser22 with NULL values when has_original_tweet is not present in temporal_user for the given date?
Thanks.
Upvotes: 1
Views: 1713
Reputation: 222
SELECT b.Id, SUM(a.has_original_tweet) s
FROM TempUser22 b
LEFT JOIN temporal_users a ON b.Id = a.User_Id
AND a.Date BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
GROUP BY b.Id;
Id s
1 null
2 1
3 null
4 3
5 null
6 null
For debugging, I used:
CREATE TEMPORARY TABLE TempUser22(Id INT, Followers INT)
SELECT 1 Id, 10 Followers UNION ALL
SELECT 2, 20 UNION ALL
SELECT 3, 30 UNION ALL
SELECT 4, 40 UNION ALL
SELECT 5, 50 UNION ALL
SELECT 6, 60
;
CREATE TEMPORARY TABLE temporal_users(`Date` DATETIME, User_Id INT, has_original_tweet INT)
SELECT '2008-02-22 12:00:00' `Date`, 1 User_Id, 1 has_original_tweet UNION ALL
SELECT '2008-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2010-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2012-12-31 12:00:00', 2, 1 UNION ALL
SELECT '2008-12-31 12:00:00', 4, 9 UNION ALL
SELECT '2010-12-31 12:00:00', 4, 1 UNION ALL
SELECT '2010-12-31 12:00:00', 4, 2 UNION ALL
SELECT '2012-12-31 12:00:00', 4, 9
;
Upvotes: 4
Reputation: 5916
That's because NULL
values will always be discarded from the where
clause
You can use a coalesce in your where
clause.
WHERE coalesce(a.Date, 'some-date-in-the-range') BETWEEN '2010-12-31-00:00:00' AND '2010-12-31-23:59:59'
With this instead, you force null
values to be considered as valid.
Upvotes: 1