MyFault
MyFault

Reputation: 427

PHP: Remove http://, http://www, https://, https:// from String and get the Domain name and TLD

I would like to create a function in PHP that removes all inputs like

http://
https://
http://www.
https://www.
http://xyz.

from a given domain name like

example.com

and returns an array like this:

'name' => 'example'
'tld' => 'com'

Any ideas how to do that?

Upvotes: 2

Views: 296

Answers (3)

Oleksandr Fediashov
Oleksandr Fediashov

Reputation: 4335

Correct way for extracting real TLD is using package that operates Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.). I recomend use TLD Extract.

Here is sample code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('http://mail.yahoo.co.uk');
$result->getSubdomain(); // will return (string) 'mail'
$result->getHostname(); // will return (string) 'yahoo'
$result->getSuffix(); // will return (string) 'co.uk'

Upvotes: 0

Saleem
Saleem

Reputation: 8978

Try following regex:

(?:^|\s)(?:https?:\/\/)?(?:\w+(?=\.).)?(?<name>.*).(?<tld>(?<=\.)\w+)

See demo at https://regex101.com/r/lI2lB4/2

If you input is

www.google.com
mail.yahoo.com.in
http://microsoft.com
http://www.google.com
http://mail.yahoo.co.uk

Captured content will be:

MATCH 1
name       = `google`
tld        = `com`

MATCH 2
name       = `yahoo.com`
tld        = `in`

MATCH 3
name       = `microsoft`
tld        = `com`

MATCH 4
name       = `google`
tld        = `com`

MATCH 5
name       = `yahoo.co`
tld        = `uk`

Upvotes: 1

Shafizadeh
Shafizadeh

Reputation: 10340

I think you don't need to remove protocol, www or even subdomain, All you need is extracting name and tdl from the URL. So try this:

RegEx solution:

<?php

$url  = 'https://www.example.com#anchor';
$host = parse_url($url, PHP_URL_HOST);  // www.example.com
preg_match('/(\w+)\.(\w+)$/', $host, $matches);
$array_result = array ( "name" => $matches[1],
                        "tld"  => $matches[2] );
print_r($array_result);

Online Demo


Without RegEx:

<?php

$url  = 'https://www.example.com#anchor';
$host = parse_url($url, PHP_URL_HOST);  // www.example.com
$host_names = explode(".", $host);
$array_result = array ( "name" => $host_names[count($host_names)-2],
                        "tld"  =>  $host_names[count($host_names)-1] );
print_r($array_result);

Online Demo


/*
 Output:
 *    Array
 *    (
 *        [name] => example
 *        [tld] => com
 *    ) 
*/

Upvotes: 1

Related Questions