Rob
Rob

Reputation: 125

PHP Strip domain name from url

I know there is a LOT of info on the web regarding to this subject but I can't seem to figure it out the way I want.

I'm trying to build a function which strips the domain name from a url:

http://blabla.com    blabla
www.blabla.net       blabla
http://www.blabla.eu blabla

Only the plain name of the domain is needed.

With parse_url I get the domain filtered but that is not enough. I have 3 functions that stips the domain but still I get some wrong outputs

function prepare_array($domains)
{
    $prep_domains = explode("\n", str_replace("\r", "", $domains)); 
    $domain_array = array_map('trim', $prep_domains); 

    return $domain_array;
}

function test($domain) 
{
    $domain = explode(".", $domain);
    return $domain[1];
}

function strip($url) 
{ 
   $url = trim($url);
   $url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url); 
   $url = preg_replace("/\/.*$/is" , "" ,$url); 
   return $url; 
}

Every possible domain, url and extension is allowed. After the function is finished, it must return a array of only the domain names itself.

UPDATE: Thanks for all the suggestions!

I figured it out with the help from you all.

function test($url) 
{   
    // Check if the url begins with http:// www. or both
    // If so, replace it
    if (preg_match("/^(http:\/\/|www.)/i", $url))
    {
        $domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
    }
    else
    {
        $domain = $url;
    }

    // Now all thats left is the domain and the extension
    // Only return the needed first part without the extension    
    $domain = explode(".", $domain);

    return $domain[0];
}

Upvotes: 3

Views: 2590

Answers (5)

Rob
Rob

Reputation: 125

function test($url) 
{   
    // Check if the url begins with http:// www. or both
    // If so, replace it
    if (preg_match("/^(http:\/\/|www.)/i", $url))
    {
        $domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
    }
    else
    {
        $domain = $url;
    }

    // Now all thats left is the domain and the extension
    // Only return the needed first part without the extension    
    $domain = explode(".", $domain);

    return $domain[0];
}

Upvotes: 1

Jim_M
Jim_M

Reputation: 273

Ok...this is messy and you should spend some time optimizing and caching previously derived domains. You should also have a friendly NameServer and the last catch is the domain must have a "A" record in their DNS.

This attempts to assemble the domain name in reverse order until it can resolve to a DNS "A" record.

At anyrate, this was bugging me, so I hope this answer helps :

<?php
$wsHostNames = array(
    "test.com",
    "http://www.bbc.com/news/uk-34276525",
    "google.uk.co"
);
foreach ($wsHostNames as $hostName) {
    echo "checking $hostName" . PHP_EOL;
    $wsWork = $hostName;
    //attempt to strip out full paths to just host
    $wsWork = parse_url($hostName, PHP_URL_HOST);
    if ($wsWork != "") {
        echo "Was able to cleanup $wsWork" . PHP_EOL;
        $hostName = $wsWork;
    } else {
        //Probably had no path info or malformed URL
        //Try to check it anyway
        echo "No path to strip from $hostName" . PHP_EOL;
    }

    $wsArray = explode(".", $hostName); //Break it up into an array.

    $wsHostName = "";
    //Build domain one segment a time probably
    //Code should be modified not to check for the first segment (.com)
    while (!empty($wsArray)) {
        $newSegment = array_pop($wsArray);
        $wsHostName = $newSegment . $wsHostName;
        echo "Checking $wsHostName" . PHP_EOL;
        if (checkdnsrr($wsHostName, "A")) {
            echo "host found $wsHostName" . PHP_EOL;
            echo "Domain is $newSegment" . PHP_EOL;
            continue(2);
        } else {
            //This segment didn't resolve - keep building
            echo "No Valid A Record for $wsHostName" . PHP_EOL;
            $wsHostName = "." . $wsHostName;
        }
    }
    //if you get to here in the loop it could not resolve the host name

}
?>

Upvotes: 2

luis martinez
luis martinez

Reputation: 182

try with preg_replace.

something like $domain = preg_replace($regex, '$1', $url);

regex

Upvotes: 1

James Dunne
James Dunne

Reputation: 770

Ah, your problem lies in the fact that TLDs can be either in one or two parts e.g .com vs .co.uk.

What I would do is maintain a list of TLDs. With the result after parse_url, go over the list and look for a match. Strip out the TLD, explode on '.' and the last part will be in the format you want it.

This does not seem as efficient as it could be but, with TLDs being added all the time, I cannot see any other deterministic way.

Upvotes: 2

Jim_M
Jim_M

Reputation: 273

How about

$wsArray = explode(".",$domain); //Break it up into an array. 
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain

http://php.net/manual/en/function.array-pop.php

Upvotes: 3

Related Questions