Reputation: 229
I have a large database that contains many urls, there are many domains repeating and i;m trying to get only the domain. eg:
http://example.com/someurl.html
http://example.com/someurl_on_the_same_domain.html
http://example.net/myurl.php
http://example.org/anotherurl.php
and i want to get only domains, eg:
http://example.com
http://example.net
http://example.org
My query is:
SELECT
id
,site
FROMtable
GROUP BYsite
ORDER BYid
DESC LIMIT 50
I need to use regex i think but i'm not mysql guru.
Upvotes: 4
Views: 6844
Reputation: 61
SELECT
COUNT(*) AS nCount,
SUBSTRING_INDEX(REPLACE(REPLACE(REPLACE(site,'http://',''),'https://',''),'www.',''),'/',1) AS sDomain
FROM tbl_table
GROUP BY sDomain
ORDER BY 1 DESC
Addon after JQman sollution with also the www. replaced and the group by + count
Upvotes: 6
Reputation: 46692
SELECT
SUBSTR(site, 1 , LOCATE('/', site, 8)-1)
as OnlyDomain
FROM table
GROUP BY OnlyDomain
ORDER BY id DESC LIMIT 50
[EDIT] : After OP request, here's the updated answer that will show correct results even if domain names does not have trailing slashes:
SELECT
SUBSTR(site, 1 , IF(LOCATE('/', site, 8), LOCATE('/', site, 8)-1, LENGTH(site)))
as OnlyDomain
FROM tablename
GROUP BY OnlyDomain
ORDER BY id DESC LIMIT 50
Upvotes: 11
Reputation: 229
SELECT id,
SUBSTRING_INDEX(REPLACE(REPLACE(site,'http://',''),'https://',''),'/',1) as domain
FROM table
GROUP BY domain
ORDER BY id DESC
LIMIT 50
That was working for me, if anybody needs it.
Upvotes: 1
Reputation: 271
You can use string replacement. Assuming that the "site" column contains the url:
select id,
substr(site,1,locate('/', site ,locate('//',site)+2)-1)
from table
group by site
order by id
desc limit 50;
Be careful to make sure that multiple slashes are encoded, eg:
http://example.com/somethingelse/someurl.html
Upvotes: 1
Reputation: 375574
You can select the domains with:
select left(site,6+locate('/',substring(site,8)))
Upvotes: 0