Kristen Grote
Kristen Grote

Reputation: 2777

MySQL: Transfer Data Based on a Column Without Also Transferring That Column

My table stores revision data for my CMS entries. Each entry has an ID and a revision date, and there are multiple revisions:

Table: old_revisions
+----------+---------------+-----------------------------------------+
| entry_id | revision_date |               entry_data                |
+----------+---------------+-----------------------------------------+
|        1 |    1302150011 | I like pie.                             |
|        1 |    1302148411 | I like pie and cookies.                 |
|        1 |    1302149885 | I like pie and cookies and cake.        |
|        2 |    1288917372 | Kittens are cute.                       |
|        2 |    1288918782 | Kittens are cute but puppies are cuter. |
|        3 |    1288056095 | Han shot first.                         |
+----------+---------------+-----------------------------------------+

I want to transfer some of this data to another table:

Table: new_revisions
+--------------+----------------+
| new_entry_id | new_entry_data |
+--------------+----------------+
|              |                |
+--------------+----------------+

I want to transfer entry_id and entry_data to new_entry_id and new_entry_data. But I only want to transfer the most recent version of each entry.

I got as far as this query:

INSERT INTO new_revisions (
    new_entry_id,
    new_entry_data
)
SELECT
entry_id,
entry_data,
MAX(revision_date)
FROM old_revisions
GROUP BY entry_id

But I think the problem is that I'm trying to insert 3 columns of data into 2 columns.

How do I transfer the data based on the revision date without transferring the revision date as well?

Upvotes: 0

Views: 84

Answers (2)

Taryn
Taryn

Reputation: 247810

You can use the following query:

insert into new_revisions (new_entry_id, new_entry_data)
select o1.entry_id, o1.entry_data
from old_revisions o1
inner join
(
  select max(revision_date) maxDate, entry_id
  from old_revisions
  group by entry_id
) o2
  on o1.entry_id = o2.entry_id
  and o1.revision_date = o2.maxDate

See SQL Fiddle with Demo. This query gets the max(revision_date) for each entry_id and then joins back to your table on both the entry_id and the max date to get the rows to be inserted.

Please note that the subquery is only returning the entry_id and date, this is because we want to apply the GROUP BY to the items in the select list that are not in an aggregate function. MySQL uses an extension to the GROUP BY clause that allows columns in the select list to be excluded in a group by and aggregate but this could causes unexpected results. By only including the columns needed by the aggregate and the group by will ensure that the result is the value you want. (see MySQL Extensions to GROUP BY)

From the MySQL Docs:

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. ... You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.

Upvotes: 3

Barranka
Barranka

Reputation: 21047

If you want to enter the last entry you need to filter it before:

select entry_id, max(revision_date) as maxDate
from old_revisions
group by entry_id;

Then use this as a subquery to filter the data you need:

insert into new_revisions (new_entry_id, new_entry_data)
select entry_id, entry_data
from old_revisions as o
    inner join (
        select entry_id, max(revision_date) as maxDate
        from old_revisions
        group by entry_id
    ) as a on o.entry_id = a.entry_id and o.revision_date = a.maxDate

Upvotes: 2

Related Questions