Daniel Lee
Daniel Lee

Reputation: 367

MYSQL - Combine rows with multiple duplicate values and delete duplicates afterwards

So I have my database set up as a single table. In that table I have collected source URL and description (I am scraping the product description from a number of pages). Unfortunately I have ended up with multiple rows in the database for a URL/source page if there is more than one paragraph.

What I would like to do is, if there are multiple rows with the same URL, combine the description from each row and then delete the duplicate rows for that URL.

My table is literally structured like so:

table             
+----+----------------------------+-------------+
| id | url                        | description |
+----+----------------------------+-------------+
|  1 | http://example.com/page-a  | paragraph 1 |
|  2 | http://example.com/page-a  | paragraph 2 |
|  3 | http://example.com/page-a  | paragraph 3 |
|  4 | http://example.com/page-b  | paragraph 1 |
|  5 | http://example.com/page-b  | paragraph 2 |
+----+----------------------------+-------------+

How I want it is like:

table             
+----+----------------------------+-------------------------------------+
| id | url                        | description                         |
+----+----------------------------+-------------------------------------+
|  1 | http://example.com/page-a  | paragraph 1 paragraph 2 paragraph 3 |
|  2 | http://example.com/page-b  | paragraph 1 paragraph 2             |
+----+----------------------------+-------------------------------------+

I'm not so bothered about the IDs being updated to be correct, I just want to be able to combine the rows where the paragraphs should be in the same field as they are the same URLs, and then delete the duplicates.

Any help would be greatly appreciated!

Upvotes: 3

Views: 198

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269753

Create a new temporary table, truncate the original, and re-insert the data:

create temporary table tempt as
    select (@rn := @rn + 1) as id, url,
           group_concat(description order by id separator ' ') as description
    from t cross join (select @rn := 0) params
    group by url 
    order by min(id);

-- Do lots of testing and checking here to be sure you have the data you want.

truncate table t;

insert into t(id, url, description)
    select id, url, description
    from tempt;

If id is already auto-incremented in the table, then you do not need to provide a value for it.

Upvotes: 1

Sabyasachi Mishra
Sabyasachi Mishra

Reputation: 1749

In SQL

SELECT MIN(id) as [ID],url, description= STUFF((SELECT '; ' 
+ ic.description FROM dbo.My_Table AS ic
WHERE ic.url= c.url
FOR XML PATH(''), TYPE).value('.','nvarchar(max)'), 1, 2, '')
FROM dbo.My_Table AS c
GROUP BY url
ORDER BY url;

Upvotes: 0

reaanb
reaanb

Reputation: 10065

It's easy to filter the table, just insert the result in a new table:

SELECT url, GROUP_CONCAT(description ORDER BY description SEPARATOR ' ') AS description
FROM `table`
GROUP BY url

Upvotes: 2

Related Questions