Luke
Luke

Reputation: 707

MySQL: For each row in table, change one row in another table

I have two tables :

SELECT * FROM data;
+----+---+---+
| id | c | g |
+----+---+---+
|  1 | 1 | 2 |
|  2 | 1 | 2 |
|  3 | 1 | 2 |
|  4 | 1 | 3 |
|  5 | 2 | 2 |
|  6 | 2 | 3 |
|  7 | 2 | 3 |
+----+---+---+

and

SELECT * FROM changes;
+----+-------+-------+---+
| id | c_old | c_new | g |
+----+-------+-------+---+
|  1 |     1 |     2 | 2 |
|  2 |     2 |     1 | 3 |
|  3 |     1 |     2 | 2 |
+----+-------+-------+---+

For each row in changes I need to change exactly one row in data where data.g=changes.g and data.c=changes.c_old. (assume that there will always be enough matches)

I'm attempting to do this with this query:

UPDATE 
data INNER JOIN changes ON 
  data.c=changes.c_old AND p.g=changes.g 
SET data.c_id=changes.c_new 
WHERE data.id IN( 
  SELECT id FROM (
    SELECT data.id from 
    data INNER JOIN changes ON
      data.c=changes.c_old AND data.g=changes.g 
    GROUP BY changes.id
  ) AS another_table
)

Now, I'm surprised that that mess of a query runs at all. But, it doesn't do what I need. The innermost select returns this table:

+----+
| id |
+----+
|  1 |
|  6 |
|  1 |
+----+

Notice that 1 appears twice. That means only two rows got changed (or row 1 got changed twice) when I needed three to change. Is there a way to make sure each id in that sub-query is unique? Is there a better way to accomplish this mess?

Thanks in advance!

Upvotes: 1

Views: 6178

Answers (2)

Xint0
Xint0

Reputation: 5399

Let's divide the process into two tasks:

  1. Getting the id of the rows to update in data and the value to set c.
  2. Updating the values.

Task 1 can be achieved by the following query (note the use of distinct to get rid of duplicates):

select distinct d.id, c.c_new
from
    data as d
    inner join changes as c
        on  d.c = c.c_old
        and d.g = c.g

This should give you the following:

| id | c_new |
|----|-------|
|  1 |     2 |
|  2 |     2 |
|  3 |     2 |
|  6 |     1 |
|  7 |     1 |

Now for the update just use the previous query as a derived table expression and join it to data:

update
    data as da
    inner join (
        select distinct d.id, c.c_new
        from
            data as d
            inner join changes as c
            on  d.c = c.c_old
            and d.g = c.g
    ) as dc
    on da.id = dc.id
set d.c = dc.c_new;

And you are done, final result:

| ID | C | G |
|----|---|---|
|  1 | 2 | 2 |
|  2 | 2 | 2 |
|  3 | 2 | 2 |
|  4 | 1 | 3 |
|  5 | 2 | 2 |
|  6 | 1 | 3 |
|  7 | 1 | 3 |

Upvotes: 0

mhoglan
mhoglan

Reputation: 371

You are selecting a field that is not part of the group by or being aggregated.

SELECT data.id from 
data INNER JOIN changes ON
    data.c=changes.c_old AND data.g=changes.g 
GROUP BY changes.id

You should use an aggregate function on the data.id in the select, or add data.id to the groupby (though I suspect that is not the result you want either)

The INNER JOIN is result in this dataset

+---------+--------+--------+------------+---------------+---------------+-----------+
| data.id | data.c | data.g | changes.id | changes.c_old | changes.c_new | changes.g |
+---------+--------+--------+------------+---------------+---------------+-----------+
|       1 |      1 |      2 |          1 |             1 |             2 |         2 |
|       1 |      1 |      2 |          3 |             1 |             2 |         2 |
|       2 |      1 |      2 |          1 |             1 |             2 |         2 |
|       2 |      1 |      2 |          3 |             1 |             2 |         2 |
|       3 |      1 |      2 |          1 |             1 |             2 |         2 |
|       3 |      1 |      2 |          3 |             1 |             2 |         2 |
|       6 |      2 |      3 |          2 |             2 |             1 |         3 |
|       7 |      2 |      3 |          2 |             2 |             1 |         3 |
+---------+--------+--------+------------+---------------+---------------+-----------+

1,2,3 are expanded out due to multiple matches in the join, and 4,5 are eliminated due to no match

You then are grouping by changes.id, which is going to result in (showing with values in CSV list after grouping)

+---------+--------+--------+------------+---------------+---------------+-----------+
| data.id | data.c | data.g | changes.id | changes.c_old | changes.c_new | changes.g |
+---------+--------+--------+------------+---------------+---------------+-----------+
|   1,2,3 |  1,1,1 |  2,2,2 |          1 |         1,1,1 |         2,2,2 |     2,2,2 |
|   1,2,3 |  1,1,1 |  2,2,2 |          3 |         1,1,1 |         2,2,2 |     2,2,2 |
|     6,7 |    2,2 |    3,3 |          2 |           2,2 |           1,1 |       3,3 |
+---------+--------+--------+------------+---------------+---------------+-----------+

Since no aggregate or deterministic way of choosing the values from the available options, you are getting the 1 from data.id chosen for both changes.id 1 and 3

Depending on what you are wanting, are you wanting 3 rows? all distinct values? you should add that deterministic behavior to the select.

btw, I am pretty sure other SQL engines would not allow that select (such as MSSQL) because its ambiguous. As for MySQL behavior in that situation, I believe it chooses the first value from the first row stored, and thus why you probably get 1 in both cases, but it is free to choose whatever value it wishes.

http://dev.mysql.com/doc/refman/5.7/en/group-by-extensions.html

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.

Upvotes: 1

Related Questions