Reputation: 8817
This query does a group by on lead_source_id:
SELECT ch.lead_source_id,
Count(DISTINCT ch.repurchased_date)
FROM customers_history ch
WHERE ch.repurchased_date >= '2014-04-01'
AND ch.repurchased_date < '2014-05-01'
AND ch.lead_source_id IS NOT NULL
GROUP BY ch.lead_source_id;
And this query totals the records in the table:
SELECT Count(DISTINCT( repurchased_date ))
FROM customers_history
INNER JOIN (SELECT DISTINCT( customer_id ) AS xcid
FROM customers_history
WHERE repurchased_date >= '2014-04-01'
AND repurchased_date < '2014-05-01'
AND lead_source_id IS NOT NULL) AS Temp
ON Temp.xcid = customer_id
WHERE repurchased_date >= '2014-04-01'
AND repurchased_date < '2014-05-01'
AND lead_source_id IS NOT NULL;
On our production data, the totals from Query1 come to 7963, but the second query prints 7905. Why the difference and how can we fix our queries?
Here's our table layout:
+--------+-------------+----------------+---------------------+--------+
| id | customer_id | lead_source_id | repurchased_date | Rating |
+--------+-------------+----------------+---------------------+--------+
| 422923 | 420450 | 4 | 2014-04-14 09:16:48 | Warm |
| 422924 | 420450 | 4 | 2014-04-14 09:16:48 | Cold |
| 422956 | 420450 | 4 | 2014-04-14 09:16:49 | Hot |
| 422933 | 420451 | 37 | 2014-04-14 09:18:41 | Hot |
| 422938 | 420452 | 1 | 2014-04-10 20:50:30 | Hot |
| 422984 | 420452 | 1 | 2014-04-12 20:50:30 | Warm |
| 422940 | 420453 | 47 | 2014-04-14 09:20:27 | Hot |
+--------+-------------+----------------+---------------------+--------+
To answer some of the possibilities about nulls:
select count(id) from customers_history where customer_id is null
: 0
select count(id) from customers_history where lead_source_id is null
: 5103
select count(id) from customers_history where repurchased_date is null
: 0
Upvotes: 1
Views: 80
Reputation: 1269513
The most obvious conclusion is that some lead_source_id
s share values of repurchased_date
.
Another possibility is that you have NULL
values for customer_id
and the second filters these out.
The third possibility is that NULL
values of lead_source_id
are adding additional values in the first query.
Upvotes: 2