Reputation: 13
I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.
For example:
+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+
I would like to get that result:
+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+
My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.
Any thoughts?
Upvotes: 1
Views: 76
Reputation: 28834
It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.
We first tokenize the FullName
into multiple words it is made out of. For that we use a number generator table gen
. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL ..
and so on.
We use Substring_Index()
with Replace()
function to get a substring out, using a single space character (' '
) as Delimiter. Trim()
is used to remove any leading/trailing spaces left.
Now, the trick is to use this result-set as a Derived table, and do a Group_Concat()
on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted
value. Eventually, we simply need to Group By
on words_sorted
to weed out the duplicates.
Query #1
SELECT
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;
| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |
Upvotes: 1