Reputation: 15950
I have an unusual query, which got me stuck right now
Table Fields are:
id bigint 20
name varchar 255
desc text
There are many records with same name and desc, but desc have some extra spaces in between words
like
1 't1' 'hello world'
2 't2' 'hello world'
I need to find those rows that have similar data
How can I find these, thanks.
Upvotes: 0
Views: 8600
Reputation: 1258
This is pretty close. Assuming:
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| d | text | YES | | NULL | |
| id | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
Then this query:
select x.id,x2.id,x.d,x2.d from x left join x as x2 on replace(x.d," ","") = replace(x2.d," ","") and x.id != x2.id having !(x2.id is null);
Gets you the duplicate rows. It fails if you have "Helloworld" (i.e. with no space) and you don't want that to match.
Upvotes: 2
Reputation: 360572
Unless you need to preserve the original data, something like this is best done at insertion time, when you create/update the record, rather than later, at comparison time.
That being said, you could do something like
SELECT id, name, desc, REPLACE(desc, ' ', ' ') as replaced
xx x <--note the number of spaces
FROM table
GROUP replaced
HAVING replaced > 1
Probably won't perfectly, and you'll have to adjust the replacement portion a few times, but this should get you started.
Upvotes: 0