Reputation: 435
I'm trying to write a complex (at least, for my level of knowledge) string but I'm having one hell of a time.
Here's the problem. I have two tables, one named t1 and one named c1.
The tables are defined as follow:
table T1:
e_id, char(8),
e_date, datetime,
e_status, varchar(2)
table C1:
e_id, char(8),
e_date, datetime,
e_status, varchar(2)
Each table contains a list of identifiers that may or may not be found in both tables (they may or may not be unique within each table), and associated statuses (can be 'OK' or 'R' in the T1 table, can be 'OK' or 'C' in the C1 table), and a datetime, e_date, associated with each occurence of e_id's
I'm trying to write a query that will:
e_date
that is within the last 24 hours.e_status = 'OK'
for each specific e_id
found in the entire T1 table
to the row resultse_Status = 'OK'
for each specific e_id
found in the entire C1 table
to the row resultsI'll do my best to write some sample data/results here. For clarity, I will disregard the tables datatypes. Assume the current date and time are 2012-Nov-08 19:00:00
T1:
C1:
Running the query would yield:
e_id, e_date, e_status, r_count, c_count
1. e_id: 'A', e_date: 2012-Nov-08 10:00:00, e_status: 'OK', r_count: 6, c_count: 2
2. e_id: 'A', e_date: 2012-Nov-08 10:00:00, e_status: 'R', r_count: 6, c_count: 2
3. e_id: 'A', e_date: 2012-Oct-15 10:00:00, e_status: 'R', r_count: 6, c_count: 2
4. e_id: 'A', e_date: 2012-Oct-15 10:00:00, e_status: 'OK', r_count: 6, c_count: 2
5. e_id: 'A', e_date: 2012-Oct-15 10:00:00, e_status: 'R', r_count: 6, c_count: 2
6. e_id: 'A', e_date: 2012-Oct-15 10:00:00, e_status: 'R', r_count: 6, c_count: 2
I am really sorry, I have had to change the date on T1 rows 3 to 7 (rows 3 4 5 6 of the results) as the values were erroneous.
T1's Row 4 was not returned because no e_id: B
was found in the last 24 hours
T1 Rows 8 and 9 were not returned because they were outside of the last 30 days
Upvotes: 1
Views: 139
Reputation: 753775
Time to do some TDQD — Test-Driven Query Design.
SELECT DISTINCT e_id
FROM T1
WHERE e_date >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
This will be a prevalent sub-query in the other parts of the query.
...where there was an entry in T1 within the last 24 hours.
SELECT a.e_id
FROM t1 AS a
JOIN (SELECT DISTINCT e_id
FROM T1
WHERE e_date >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
) AS b ON b.e_id = a.e_id
WHERE a.e_date >= DATE_SUB(NOW(), INTERVAL 30 DAY)
We can add other columns as we need them.
...where there was an entry in T1 within the last 24 hours
SELECT a.e_id, COUNT(*) AS r_count -- Per question; why not t_count?
FROM t1 AS a
JOIN (SELECT DISTINCT e_id
FROM T1
WHERE e_date >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
) AS b ON b.e_id = a.e_id
WHERE a.e_status = 'R'
GROUP BY a.e_id
...where there was an entry in T1 within the last 24 hours
SELECT a.e_id, COUNT(*) AS c_count
FROM c1 AS a
JOIN (SELECT DISTINCT e_id
FROM T1
WHERE e_date >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
) AS b ON b.e_id = a.e_id
WHERE a.e_status = 'C'
GROUP BY a.e_id
SELECT a.e_id, a.e_date, a.e_status, c.r_count, d.c_count
FROM t1 AS a
JOIN (SELECT DISTINCT e_id
FROM T1
WHERE e_date >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
) AS b ON b.e_id = a.e_id
LEFT JOIN -- Because there might be no OK rows in T1
(SELECT a.e_id, COUNT(*) AS r_count
FROM t1 AS a
JOIN (SELECT DISTINCT e_id
FROM T1
WHERE e_date >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
) AS b ON b.e_id = a.e_id
WHERE a.e_status = 'OK'
GROUP BY a.e_id
) AS c ON c.e_id = a.e_id
LEFT JOIN -- Because there might be no OK rows in C1
(SELECT a.e_id, COUNT(*) AS c_count
FROM c1 AS a
JOIN (SELECT DISTINCT e_id
FROM T1
WHERE e_date >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
) AS b ON b.e_id = a.e_id
WHERE a.e_status = 'OK'
GROUP BY a.e_id
) AS d ON d.e_id = a.e_id
WHERE a.e_date >= DATE_SUB(NOW(), INTERVAL 30 DAY)
You probably could write the sub-queries without the 24 hour sub-sub-query, but it is likely to be effective to eliminate as many rows as soon as possible.
One advantage of the concept behind TDQD is that you can check interim results. There were some basically trivial syntax issues (in part because MySQL is not my primary DBMS), but the change from JOIN to LEFT JOIN for the two COUNT sub-queries is the sort of thing you're apt to spot as you assemble the query. Trying to get everything right first time is — hard, if not futile. But the step-by-step build-up can give you confidence in what you've done. I'd never build a query as complex as this from scratch without testing the component sub-queries.
Thanks for the (minor) updates, FatalMojo.
Upvotes: 2