Wilf
Wilf

Reputation: 2315

How to find duplicate rows with similar part of string

I have thousands of rows in a table. Some rows have similar keywords but can be categorized in the same group. For example:

Table : Birds_Name
    +-------+---------------------+
    |ID     |Name                 |
    +-------+---------------------+
    |1      |Blue Peckwood        |
    +-------+---------------------+
    |2      |North Peckwood       |
    +-------+---------------------+
    |3      |Northern Peckwood    |
    +-------+---------------------+
    |4      |Northern Peckwood    |
    +-------+---------------------+
    |5      |Red Heron            |
    +-------+---------------------+
    |6      |Red Heron            |
    +-------+---------------------+

As for the table above there should be 2 groups of birds. They are Peckwood and Heron.

But after I run this mySQL I get :

SELECT *
FROM birds_name
WHERE name IN (
    SELECT name
    FROM birds_name
    GROUP BY name
    HAVING COUNT(*) > 1
)

After I run the query. This is what I've got:

    +-------+---------------------+
    |3      |Northern Peckwood    |
    +-------+---------------------+
    |4      |Northern Peckwood    |
    +-------+---------------------+
    |5      |Red Heron            |
    +-------+---------------------+
    |6      |Red Heron            |
    +-------+---------------------+

Actually, I expect any row which share a similar string to be chosen (in this case it's Peckwood. So it should have only 2 groups - Peckwood and Heron.

Is it possible to do so? And how to adapt mysql code to achieve it?

Regards.

Upvotes: 0

Views: 1935

Answers (3)

Bla...
Bla...

Reputation: 7288

I think you can separate those words using MySQL String functions, like below:

mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
        -> 'www.mysql'
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
        -> 'mysql.com'

Then, use it in the GROUP BY clause of your query.

UPDATE :

Here is my SQLFiddle.

Upvotes: 0

Sadikhasan
Sadikhasan

Reputation: 18600

Try this

SELECT SUBSTRING_INDEX(name,' ',-1),count(*)
FROM birds_name
GROUP BY SUBSTRING_INDEX(name,' ',-1) HAVING count(*)>0;

Manual for SUBSTRING_INDEX function in mysql.

Upvotes: 2

noobdeveloper
noobdeveloper

Reputation: 420

Can you try this.

SELECT count(id),name
  FROM birds_name
 group by name
having count(id) >1

Thanks

SQL Fiddle

Upvotes: 0

Related Questions