Reputation: 367
So I have my database set up as a single table. In that table I have collected source URL and description (I am scraping the product description from a number of pages). Unfortunately I have ended up with multiple rows in the database for a URL/source page if there is more than one paragraph.
What I would like to do is, if there are multiple rows with the same URL, combine the description from each row and then delete the duplicate rows for that URL.
My table is literally structured like so:
table
+----+----------------------------+-------------+
| id | url | description |
+----+----------------------------+-------------+
| 1 | http://example.com/page-a | paragraph 1 |
| 2 | http://example.com/page-a | paragraph 2 |
| 3 | http://example.com/page-a | paragraph 3 |
| 4 | http://example.com/page-b | paragraph 1 |
| 5 | http://example.com/page-b | paragraph 2 |
+----+----------------------------+-------------+
How I want it is like:
table
+----+----------------------------+-------------------------------------+
| id | url | description |
+----+----------------------------+-------------------------------------+
| 1 | http://example.com/page-a | paragraph 1 paragraph 2 paragraph 3 |
| 2 | http://example.com/page-b | paragraph 1 paragraph 2 |
+----+----------------------------+-------------------------------------+
I'm not so bothered about the IDs being updated to be correct, I just want to be able to combine the rows where the paragraphs should be in the same field as they are the same URLs, and then delete the duplicates.
Any help would be greatly appreciated!
Upvotes: 3
Views: 198
Reputation: 1269753
Create a new temporary table, truncate the original, and re-insert the data:
create temporary table tempt as
select (@rn := @rn + 1) as id, url,
group_concat(description order by id separator ' ') as description
from t cross join (select @rn := 0) params
group by url
order by min(id);
-- Do lots of testing and checking here to be sure you have the data you want.
truncate table t;
insert into t(id, url, description)
select id, url, description
from tempt;
If id
is already auto-incremented in the table, then you do not need to provide a value for it.
Upvotes: 1
Reputation: 1749
In SQL
SELECT MIN(id) as [ID],url, description= STUFF((SELECT '; '
+ ic.description FROM dbo.My_Table AS ic
WHERE ic.url= c.url
FOR XML PATH(''), TYPE).value('.','nvarchar(max)'), 1, 2, '')
FROM dbo.My_Table AS c
GROUP BY url
ORDER BY url;
Upvotes: 0
Reputation: 10065
It's easy to filter the table, just insert the result in a new table:
SELECT url, GROUP_CONCAT(description ORDER BY description SEPARATOR ' ') AS description
FROM `table`
GROUP BY url
Upvotes: 2