fioljnw
fioljnw

Reputation: 13

MySql: Select Distinct for words in different order

I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.

For example:

+----+----------------------+
| ID | Full Name            |
+----+----------------------+
| 1  | Marshall Wilson      |
| 2  | Wilson Marshall      |
| 3  | Lori Hill            |
| 4  | Hill Lori            |
| 5  | Casey Dean Davidson  |
| 6  | Davidson Casey Dean  |
+----+----------------------+

I would like to get that result:

+----+-----------------------+
| ID | Full Name             |
+----+-----------------------+
| 1  | Marshall Wilson       |
| 3  | Lori Hill             |
| 5  | Casey Dean Davidson   |
+----+-----------------------+

My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.

Any thoughts?

Upvotes: 1

Views: 76

Answers (1)

Madhur Bhaiya
Madhur Bhaiya

Reputation: 28834

It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.

We first tokenize the FullName into multiple words it is made out of. For that we use a number generator table gen. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL .. and so on.

We use Substring_Index() with Replace() function to get a substring out, using a single space character (' ') as Delimiter. Trim() is used to remove any leading/trailing spaces left.

Now, the trick is to use this result-set as a Derived table, and do a Group_Concat() on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted value. Eventually, we simply need to Group By on words_sorted to weed out the duplicates.


Query #1

SELECT 
  MIN(dt2.ID) AS ID, 
  MIN(dt2.FullName) AS FullName 
FROM 
(
SELECT 
  dt1.ID, 
  dt1.FullName, 
  GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted 
FROM 
(
SELECT e.ID, 
       e.FullName, 
       TRIM(REPLACE(
         SUBSTRING_INDEX(e.FullName, ' ', gen.idx), 
         SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
         '')) AS word 
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL 
            SELECT 2 UNION ALL 
            SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1 
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;

| ID  | FullName            |
| --- | ------------------- |
| 1   | Marshall Wilson     |
| 3   | Hill Lori           |
| 5   | Casey Dean Davidson |

View on DB Fiddle

Upvotes: 1

Related Questions