Reputation: 707
I have two tables :
SELECT * FROM data;
+----+---+---+
| id | c | g |
+----+---+---+
| 1 | 1 | 2 |
| 2 | 1 | 2 |
| 3 | 1 | 2 |
| 4 | 1 | 3 |
| 5 | 2 | 2 |
| 6 | 2 | 3 |
| 7 | 2 | 3 |
+----+---+---+
and
SELECT * FROM changes;
+----+-------+-------+---+
| id | c_old | c_new | g |
+----+-------+-------+---+
| 1 | 1 | 2 | 2 |
| 2 | 2 | 1 | 3 |
| 3 | 1 | 2 | 2 |
+----+-------+-------+---+
For each row in changes
I need to change exactly one row in data
where data.g=changes.g and data.c=changes.c_old
. (assume that there will always be enough matches)
I'm attempting to do this with this query:
UPDATE
data INNER JOIN changes ON
data.c=changes.c_old AND p.g=changes.g
SET data.c_id=changes.c_new
WHERE data.id IN(
SELECT id FROM (
SELECT data.id from
data INNER JOIN changes ON
data.c=changes.c_old AND data.g=changes.g
GROUP BY changes.id
) AS another_table
)
Now, I'm surprised that that mess of a query runs at all. But, it doesn't do what I need. The innermost select returns this table:
+----+
| id |
+----+
| 1 |
| 6 |
| 1 |
+----+
Notice that 1
appears twice. That means only two rows got changed (or row 1 got changed twice) when I needed three to change. Is there a way to make sure each id in that sub-query is unique? Is there a better way to accomplish this mess?
Thanks in advance!
Upvotes: 1
Views: 6178
Reputation: 5399
Let's divide the process into two tasks:
id
of the rows to update in data
and the value to set c
.Task 1 can be achieved by the following query (note the use of distinct
to get rid of duplicates):
select distinct d.id, c.c_new
from
data as d
inner join changes as c
on d.c = c.c_old
and d.g = c.g
This should give you the following:
| id | c_new |
|----|-------|
| 1 | 2 |
| 2 | 2 |
| 3 | 2 |
| 6 | 1 |
| 7 | 1 |
Now for the update just use the previous query as a derived table expression and join it to data
:
update
data as da
inner join (
select distinct d.id, c.c_new
from
data as d
inner join changes as c
on d.c = c.c_old
and d.g = c.g
) as dc
on da.id = dc.id
set d.c = dc.c_new;
And you are done, final result:
| ID | C | G |
|----|---|---|
| 1 | 2 | 2 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |
| 4 | 1 | 3 |
| 5 | 2 | 2 |
| 6 | 1 | 3 |
| 7 | 1 | 3 |
Upvotes: 0
Reputation: 371
You are selecting a field that is not part of the group by or being aggregated.
SELECT data.id from
data INNER JOIN changes ON
data.c=changes.c_old AND data.g=changes.g
GROUP BY changes.id
You should use an aggregate function on the data.id in the select, or add data.id to the groupby (though I suspect that is not the result you want either)
The INNER JOIN is result in this dataset
+---------+--------+--------+------------+---------------+---------------+-----------+
| data.id | data.c | data.g | changes.id | changes.c_old | changes.c_new | changes.g |
+---------+--------+--------+------------+---------------+---------------+-----------+
| 1 | 1 | 2 | 1 | 1 | 2 | 2 |
| 1 | 1 | 2 | 3 | 1 | 2 | 2 |
| 2 | 1 | 2 | 1 | 1 | 2 | 2 |
| 2 | 1 | 2 | 3 | 1 | 2 | 2 |
| 3 | 1 | 2 | 1 | 1 | 2 | 2 |
| 3 | 1 | 2 | 3 | 1 | 2 | 2 |
| 6 | 2 | 3 | 2 | 2 | 1 | 3 |
| 7 | 2 | 3 | 2 | 2 | 1 | 3 |
+---------+--------+--------+------------+---------------+---------------+-----------+
1,2,3 are expanded out due to multiple matches in the join, and 4,5 are eliminated due to no match
You then are grouping by changes.id, which is going to result in (showing with values in CSV list after grouping)
+---------+--------+--------+------------+---------------+---------------+-----------+
| data.id | data.c | data.g | changes.id | changes.c_old | changes.c_new | changes.g |
+---------+--------+--------+------------+---------------+---------------+-----------+
| 1,2,3 | 1,1,1 | 2,2,2 | 1 | 1,1,1 | 2,2,2 | 2,2,2 |
| 1,2,3 | 1,1,1 | 2,2,2 | 3 | 1,1,1 | 2,2,2 | 2,2,2 |
| 6,7 | 2,2 | 3,3 | 2 | 2,2 | 1,1 | 3,3 |
+---------+--------+--------+------------+---------------+---------------+-----------+
Since no aggregate or deterministic way of choosing the values from the available options, you are getting the 1 from data.id chosen for both changes.id 1 and 3
Depending on what you are wanting, are you wanting 3 rows? all distinct values? you should add that deterministic behavior to the select.
btw, I am pretty sure other SQL engines would not allow that select (such as MSSQL) because its ambiguous. As for MySQL behavior in that situation, I believe it chooses the first value from the first row stored, and thus why you probably get 1 in both cases, but it is free to choose whatever value it wishes.
http://dev.mysql.com/doc/refman/5.7/en/group-by-extensions.html
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.
Upvotes: 1