stack
stack

Reputation: 10228

How can I both use DISTINCT and don't care about one column's value?

Here is my query:

INSERT INTO comments (root_id, r_id, user_id, comment_content)
SELECT DISTINCT root_id, r_id,user_id, comment_content FROM comments_old WHERE 1

Noted that I've a unique index on comments(root_id, r_id, user_id) and query above fails because DISTINCT cares about duplicate comment_content column values and it is not exist in that unique index. (because its type is TEXT and I cannot add it into the unique index).

Anyway, I want to select comment_content but don't care about being duplicate of this column. How can I do that?

Upvotes: 0

Views: 205

Answers (4)

user359040
user359040

Reputation:

Thanks to a feature of MySQL, the following should also work*:

INSERT INTO comments (root_id, r_id, user_id, comment_content)
SELECT root_id, r_id, user_id, comment_content
from comments_old
group by root_id, r_id, user_id

This is because MySQL will allow ungrouped, unaggregated values to be included in a GROUPed select - the value selected is essentially random (within the values for the specific grouping).

.* - in versions of MySQL prior to 5.7.5, or in which ONLY_FULL_GROUP_BY is not enabled - see MySQL Documentation for further details.

Upvotes: 1

Solarflare
Solarflare

Reputation: 11106

You can also use a slightly different approach and react to the violation of the unique key.

You can e.g. use insert ignore

INSERT IGNORE INTO comments (root_id, r_id, user_id, comment_content)
SELECT DISTINCT root_id, r_id,user_id, comment_content 
FROM comments_old WHERE 1;

or on duplicate key update

INSERT INTO comments (root_id, r_id, user_id, comment_content)
SELECT DISTINCT root_id, r_id,user_id, comment_content 
FROM comments_old WHERE 1
ON DUPLICATE KEY UPDATE comment_content = values(comment_content);

insert ignore will also ignore other kind of errors, while ON DUPLICATE KEY UPDATE (or ON DUPLICATE KEY IGNORE) will only skip (or handle) unique key violations. If you use update, it will, in this case, always take the newest value (so will overwrite existing rows).

An important difference to using group by is that this will also ignore a new row if a row with that key is already in the table - which is not exactly what you asked for, but may be what you are looking for.

Upvotes: 1

Use a subquery containing the distinct stuff, and join it to the table itself querying the comment_content values.

INSERT INTO comments (root_id, r_id, user_id, comment_content)
SELECT a.root_id, a.r_id, a.user_id, b.comment_content FROM
    (SELECT DISTINCT root_id, r_id,user_id FROM comments_old WHERE 1) a, comments_old b
    WHERE a.root_id = b.root_id and a.r_id = b.r_id and a.use_id = b.user_id

This will give you two lines if two different comment_content values exist for the same set of keys.

Upvotes: 1

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239764

In most dialects of SQL, there's no way to express "pick a value, I don't care which". There may be a mysql dialect way but I'd expect the standard:

select root_id,r_id,user_id,MAX(comment_content)
from comments_old
group by root_id,r_id,user_id

should work. Of course, MAX is somewhat arbitrary, MIN could also be used - but you said you didn't care what value was picked.

Upvotes: 1

Related Questions