Reputation: 1401
We are trying to find a solution to a problem that I have been able to replicate with an illustrative table. Here's how the illustrative table is created:
create table illustrativeTable
(
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
label VARCHAR(4),
reportingDate DATE,
attr_1 INT,
attr_2 INT,
attr_3 INT,
PRIMARY KEY(id)
);
I have populated the illustrative table as follows:
INSERT INTO illustrativeTable(label, reportingDate, attr_1, attr_2, attr_3) VALUES('A', '2018-01-01', '1', '3', '100'),
('A', '2018-01-05', '2', '4', '125'), ('A', '2018-01-07', '2', '5', '125'),
('A', '2018-01-08', '3', '6', '150'), ('A', '2018-01-11', '4', '7', NULL),
('B', '2018-01-02', '1', '3', '50'), ('B', '2018-01-05', '2', '5', '75'),
('B', '2018-01-06', '3', '6', '200'), ('B', '2018-01-16', '3', '5', '200'),
('C', '2018-01-05', '6', '9', '175'),('C', '2018-01-08', '7', '9', '225'),
('D', '2018-01-01', '2', '5', '55'), ('D', '2018-01-15', '3', '6', 85),
('D', '2018-01-21', '4', '7', '75'), ('E', '2018-01-25', '2', '4', '65'),
('E', '2018-01-28', '2', '5', NULL);
The query
SELECT * FROM illustrativeTable;
yields the following output:
+----+-------+---------------+--------+--------+--------+
| id | label | reportingDate | attr_1 | attr_2 | attr_3 |
+----+-------+---------------+--------+--------+--------+
| 1 | A | 2018-01-01 | 1 | 3 | 100 |
| 2 | A | 2018-01-05 | 2 | 4 | 125 |
| 3 | A | 2018-01-07 | 2 | 5 | 125 |
| 4 | A | 2018-01-08 | 3 | 6 | 150 |
| 5 | A | 2018-01-11 | 4 | 7 | NULL |
| 6 | B | 2018-01-02 | 1 | 3 | 50 |
| 7 | B | 2018-01-05 | 2 | 5 | 75 |
| 8 | B | 2018-01-06 | 3 | 6 | 200 |
| 9 | B | 2018-01-16 | 3 | 5 | 200 |
| 10 | C | 2018-01-05 | 6 | 9 | 175 |
| 11 | C | 2018-01-08 | 7 | 9 | 225 |
| 12 | D | 2018-01-01 | 2 | 5 | 55 |
| 13 | D | 2018-01-15 | 3 | 6 | 85 |
| 14 | D | 2018-01-21 | 4 | 7 | 75 |
| 15 | E | 2018-01-25 | 2 | 4 | 65 |
| 16 | E | 2018-01-28 | 2 | 5 | NULL |
+----+-------+---------------+--------+--------+--------+
Our issue is that we want to retrieve label, reportingDate and attr_3, from selected tuples, with the following constraints:
1) attr_2 - attr_1 = 3
2) attr_3 IS NOT NULL
3) In case of multiple hits, the value with the highest value for reportingDate is selected
The simplistic query:
SELECT label, reportingDate, attr_3 FROM illustrativeTable
WHERE label IN ('A', 'B', 'C', 'E') AND (attr_2-attr_1=3)
AND attr_3 IS NOT NULL GROUP BY label;
yields the following result:
+-------+---------------+--------+
| label | reportingDate | attr_3 |
+-------+---------------+--------+
| A | 2018-01-07 | 125 |
| B | 2018-01-05 | 75 |
| C | 2018-01-05 | 175 |
+-------+---------------+--------+
The problem with this result is that for label 'A' the highest reportingDate that meets all the constraints is 2018-01-08. Similarly for label 'B' the highest reportingDate that meets all the constraints is 2018-01-06.
We would like to tweak the query so that the output looks as follows:
+-------+---------------+--------+
| label | reportingDate | attr_3 |
+-------+---------------+--------+
| A | 2018-01-08 | 150 |
| B | 2018-01-06 | 200 |
| C | 2018-01-05 | 175 |
+-------+---------------+--------+
I did try some ideas from https://paulund.co.uk/get-last-record-in-each-mysql-group but I could not get the results I am looking for.
Upvotes: 1
Views: 64
Reputation: 147146
To get the output for the highest reportingDate
, you just need to add that as a constraint to the query. Note that unless you have multiple data values for a given reportingDate
, you don't need a GROUP BY
clause:
SELECT label, reportingDate, attr_3
FROM illustrativeTable it1
WHERE label IN ('A', 'B', 'C', 'E') AND
(attr_2-attr_1=3) AND
attr_3 IS NOT NULL AND
reportingDate = (SELECT MAX(reportingDate)
FROM illustrativeTable
WHERE label = it1.label AND
attr_2-attr_1=3 AND
attr_3 IS NOT NULL)
Output:
label reportingDate attr_3
A 2018-01-08 150
B 2018-01-06 200
C 2018-01-05 175
If you do have multiple values for a given reportingDate
, you will need to GROUP BY label
, and you will also have to decide whether you want the minimum or maximum value of attr_3
, in which case you would change attr_3
in the query to MIN(attr_3)
or MAX(attr_3)
respectively.
Update
Based on the additional criteria specified by OP in a comment below, this is probably the most efficient query to get the desired result. It joins a sub-select of the initial table (with the non-aggregating conditions applied) to two other tables which give the maximum reportingDate
by label
and the maximum value of attr2
by reportingDate
and label
respectively, using the JOIN
condition to then filter out all entries which do not match MAX(reportingDate)
and MAX(attr_2)
.
SELECT it1.label, it1.reportingDate, it1.attr_3
FROM (SELECT *
FROM illustrativeTable
WHERE label IN ('A', 'B', 'C', 'E') AND
(attr_2-attr_1=3) AND
attr_3 IS NOT NULL) it1
JOIN (SELECT label, MAX(reportingDate) AS max_reportingDate
FROM illustrativeTable it1
WHERE attr_2-attr_1=3 AND attr_3 IS NOT NULL
GROUP BY label) it2
ON it2.label = it1.label AND it2.max_reportingDate = it1.reportingDate
JOIN (SELECT label, reportingDate, MAX(attr_2) AS max_attr_2
FROM illustrativeTable it1
WHERE attr_2-attr_1=3 AND attr_3 IS NOT NULL
GROUP BY label, reportingDate) it3
ON it3.label = it1.label AND it3.reportingDate = it1.reportingDate AND it3.max_attr_2 = it1.attr_2
ORDER BY it1.label
For the sample data the output remains the same however I have tested it with data which triggers the MAX(attr_2)
condition on rextester.
Upvotes: 2