Tom Sprud
Tom Sprud

Reputation: 57

MySQL - Retrieve all records older than 12 months except most recent

I need to write a query in MySQL that retrieves all rows from the Downloads table for which the Downloaded_at field is older than 12 months for each unique reference number (Ref), except for the most recent row out of that dataset. So it has to be grouped by the reference number (Ref) rather than Id. The Downloads table looks roughly like:

*-----Id-------Ref--------Downloaded_At----------------File_Name------*

  1      4WP    2009-08-14 10:13:58    test.txt
  2      3K8    2009-05-21 11:11:10    test2.txt
  3      3K8    2011-12-12 08:12:09    test3.txt

For example, downloads exist for ref ABC with downloaded at timestamps of 01Dec12 8.30, 01Dec12 9.30, 01Dec12 10.30. Then the 01Dec12 10.30 record should NOT appear in the query as it's the most recent one.

I think it'd be an easy query to write if I was grouping by Id instead of Ref to get the results, but as it stands I'm struggling a bit. I have this query so far, which (theoretically) should return the correct results but takes FAR too long on my test database - over 400 seconds before I just cancel it and give up.

SELECT * FROM downloads 
WHERE Downloaded_At < DATE_SUB(NOW(), INTERVAL 1 YEAR)
    AND Id NOT IN 
       (SELECT Id FROM downloads INNER JOIN (select Id, Ref, MAX(Downloaded_At) 
        FROM downloads
        WHERE Downloaded_At < DATE_SUB(NOW(), INTERVAL 1 YEAR)
        GROUP BY Ref) dl on dl.id = downloads.id);

Can anyone suggest a better / more efficient way to get the results I'm after?

Upvotes: 0

Views: 1158

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269703

If I understand correctly, you want to filter the most recent of the download records for each ref. I don't think your query will do this.

SELECT *
FROM downloads d
WHERE Downloaded_At < DATE_SUB(NOW(), INTERVAL 1 YEAR) and
      Downloaded_At < (SELECT max(Downloaded_At)
                       FROM downloads d2
                       WHERE d2.ref = d.ref
                      );

This query will run faster with an index on downloads(ref, Downloaded_At).

Upvotes: 2

Related Questions