Lex
Lex

Reputation: 396

Query to delete old records, with an exception

Lets say I have a table that keeps a history of prices, for example:

 _______________________________________________________
|Make  | Model |  Year  |  price  |    date_of_record   |
|--------------------------------------------------------
|Mazda |   6   |  2008  | 10,000  | 2013-05-13 00:00:00 |
|Mazda |   6   |  2008  | 11,000  | 2012-05-13 00:00:00 |
|Mazda |   6   |  2008  | 12,000  | 2011-05-13 00:00:00 |
|Mazda |   6   |  2008  | 11,750  | 2010-05-13 00:00:00 |
|Honda | Civic |  2000  | 3,000   | 2011-05-13 00:00:00 |
|Honda | Civic |  2000  | 4,000   | 2010-05-13 00:00:00 |
|Honda | Civic |  2000  | 5,000   | 2009-05-13 00:00:00 |
|Honda | Civic |  2000  | 5,050   | 2008-05-13 00:00:00 |
|Acura |   TL  |  2009  | 21,000  | 2010-05-13 00:00:00 |
|--------------------------------------------------------

I want to delete entries that are more than 2 years old unless the latest entry is more than 2 years old. It is hard to put into words, basically i want to end up with this:

 _______________________________________________________
|Make  | Model |  Year  |  price  |    date_of_record   |
|--------------------------------------------------------
|Mazda |   6   |  2008  | 10,000  | 2013-05-13 00:00:00 |
|Mazda |   6   |  2008  | 11,000  | 2012-05-13 00:00:00 |
|Honda | Civic |  2000  | 3,000   | 2011-05-13 00:00:00 |
|Acura |   TL  |  2009  | 21,000  | 2010-05-13 00:00:00 |
|--------------------------------------------------------

so the query would delete the entries that are more than two years old for the mazda, and the entries that are more than 2 years old for the Honda and Acura, with the exception of the latest.

Entries are inserted to the table only when the price changes, so if the price hasn't changed in 5 years the entry will be 5 years old, but that does not mean that the data is not accurate.

Upvotes: 1

Views: 134

Answers (1)

DRapp
DRapp

Reputation: 48169

First, lets get a list of all car, make, model, year and the respective latest "date of record". So, you could have a 2007 vehicle AND a 2008 of the same make/model each with a latest date entry of 2011-01-01. You would want to keep the vehicle entry for BOTH 2007 and 2008 models however, your example only showed the old dates would only want to keep the retained data for the Honda Civic and Acural TL respectively.

Run this, just to confirm those results

SELECT 
      c.`make`,
      c.`model`,
      c.`year`,
      MAX( c.date_of_record ) as LatestRecordDate
   from 
      YourCarPriceTable c
   group by
      c.`make`,
      c.`model`,
      c.`year`
   having
      MAX( c.date_of_record ) < DATE_ADD(CURDATE(),INTERVAL -2 YEAR)

Now, use this as a basis to delete from the current table via a Left Join on the same make, model and year. Two conditions to allow the delete.
1) There is NO Match found in the "KeepThese" subquery and the latest date of record is more than two years old...

2) There IS a match found in the "KeepThese" subquery and the date is older than the most recent that qualified the (too old) result set of KeepThese.

Per your data example, the "KeepThese" result set should include

Make   Model   Year   LatestRecordDate
Honda  Civic   2000   2011-05-13 00:00:00
Acura  TL      2009   2010-05-13 00:00:00

So, your Mazda 6 would delete the two entries for the 2010 and 2011 date entries since that make/model/year is NOT included in the list (via test IS NULL).

Your Honda Civic will delete all entries that are older than it's latest "KeepThese" value of 2011-05-13, and thus delete the 2008, 2009 and 2010 entries.

Your Acura TL does not have anything older than it's single entry, so it would be left alone.

DELETE YPT.* FROM
   YourCarPriceTable YPT
      LEFT JOIN ( SELECT 
                        c.`make`,
                        c.`model`,
                        c.`year`,
                        MAX( c.date_of_record ) as LatestRecordDate
                     from 
                        YourCarPriceTable c
                     group by
                        c.`make`,
                        c.`model`,
                        c.`year`
                     having
                        MAX( c.date_of_record ) < DATE_ADD(CURDATE(),INTERVAL -2 YEAR) ) KeepThese
         ON YPT.`make` = KeepThese.`make`
         AND YPT.`model` = KeepThese.`model`
         AND YPT.`year` = KeepThese.`year`
   where
      (     KeepThese.`make` IS NULL
        AND YPT.date_of_record < DATE_ADD(CURDATE(),INTERVAL -2 YEAR))
      OR
      (      YPT.`make` = KeepThese.`make`
         AND YPT.`model` = KeepThese.`model`
         AND YPT.`year` = KeepThese.`year`
         AND YPT.date_of_record < KeepThese.LatestRecordDate )

The subquery is processed once all up front, THEN left-joined to the delete from version of your car/price table.

Upvotes: 2

Related Questions