Reputation: 57
I need to write a query in MySQL that retrieves all rows from the Downloads table for which the Downloaded_at field is older than 12 months for each unique reference number (Ref), except for the most recent row out of that dataset. So it has to be grouped by the reference number (Ref) rather than Id. The Downloads table looks roughly like:
*-----Id-------Ref--------Downloaded_At----------------File_Name------*
1 4WP 2009-08-14 10:13:58 test.txt
2 3K8 2009-05-21 11:11:10 test2.txt
3 3K8 2011-12-12 08:12:09 test3.txt
For example, downloads exist for ref ABC with downloaded at timestamps of 01Dec12 8.30, 01Dec12 9.30, 01Dec12 10.30. Then the 01Dec12 10.30 record should NOT appear in the query as it's the most recent one.
I think it'd be an easy query to write if I was grouping by Id instead of Ref to get the results, but as it stands I'm struggling a bit. I have this query so far, which (theoretically) should return the correct results but takes FAR too long on my test database - over 400 seconds before I just cancel it and give up.
SELECT * FROM downloads
WHERE Downloaded_At < DATE_SUB(NOW(), INTERVAL 1 YEAR)
AND Id NOT IN
(SELECT Id FROM downloads INNER JOIN (select Id, Ref, MAX(Downloaded_At)
FROM downloads
WHERE Downloaded_At < DATE_SUB(NOW(), INTERVAL 1 YEAR)
GROUP BY Ref) dl on dl.id = downloads.id);
Can anyone suggest a better / more efficient way to get the results I'm after?
Upvotes: 0
Views: 1158
Reputation: 1269703
If I understand correctly, you want to filter the most recent of the download records for each ref. I don't think your query will do this.
SELECT *
FROM downloads d
WHERE Downloaded_At < DATE_SUB(NOW(), INTERVAL 1 YEAR) and
Downloaded_At < (SELECT max(Downloaded_At)
FROM downloads d2
WHERE d2.ref = d.ref
);
This query will run faster with an index on downloads(ref, Downloaded_At)
.
Upvotes: 2