eben
eben

Reputation: 229

GROUP domain from url in MySql

I have a large database that contains many urls, there are many domains repeating and i;m trying to get only the domain. eg:

http://example.com/someurl.html
http://example.com/someurl_on_the_same_domain.html
http://example.net/myurl.php
http://example.org/anotherurl.php

and i want to get only domains, eg:

http://example.com
http://example.net
http://example.org

My query is:

SELECT id, site FROM table GROUP BY site ORDER BY id DESC LIMIT 50

I need to use regex i think but i'm not mysql guru.

Upvotes: 4

Views: 6844

Answers (5)

Dennis de Jong
Dennis de Jong

Reputation: 61

SELECT 
COUNT(*) AS nCount,
SUBSTRING_INDEX(REPLACE(REPLACE(REPLACE(site,'http://',''),'https://',''),'www.',''),'/',1) AS sDomain 
FROM tbl_table
GROUP BY sDomain 
ORDER BY 1 DESC

Addon after JQman sollution with also the www. replaced and the group by + count

Upvotes: 6

shamittomar
shamittomar

Reputation: 46692

SELECT
    SUBSTR(site, 1 , LOCATE('/', site, 8)-1)
        as OnlyDomain
    FROM table
    GROUP BY OnlyDomain
    ORDER BY id DESC LIMIT 50

[EDIT] : After OP request, here's the updated answer that will show correct results even if domain names does not have trailing slashes:

SELECT
    SUBSTR(site, 1 , IF(LOCATE('/', site, 8), LOCATE('/', site, 8)-1, LENGTH(site)))
        as OnlyDomain
    FROM tablename
    GROUP BY OnlyDomain
    ORDER BY id DESC LIMIT 50

Upvotes: 11

eben
eben

Reputation: 229

SELECT id,
       SUBSTRING_INDEX(REPLACE(REPLACE(site,'http://',''),'https://',''),'/',1) as domain 
       FROM table
       GROUP BY domain 
       ORDER BY id DESC 
       LIMIT 50

That was working for me, if anybody needs it.

Upvotes: 1

zevra0
zevra0

Reputation: 271

You can use string replacement. Assuming that the "site" column contains the url:

select id, 
substr(site,1,locate('/', site ,locate('//',site)+2)-1)
from table 
group by site
order by id
desc limit 50;

Be careful to make sure that multiple slashes are encoded, eg:

http://example.com/somethingelse/someurl.html

Upvotes: 1

Ned Batchelder
Ned Batchelder

Reputation: 375574

You can select the domains with:

select left(site,6+locate('/',substring(site,8)))

Upvotes: 0

Related Questions