LXSoft
LXSoft

Reputation: 596

Parsing complex URLs

I try to parse a list of url strings, after two hours of work I don't reach any result, the list of url strings look like this:

$url_list = array(
    'http://google.com',
    'http://localhost:8080/test/project/',
    'http://mail.yahoo.com',
    'http://www.bing.com',
    'http://www.phpromania.net/forum/viewtopic.php?f=24&t=7549',
    'https://prodgame10.alliances.commandandconquer.com/12/index.aspx',
    'https://prodgame10.alliances.commandandconquer.ro/12/index.aspx',
);

Output should be:

Array
(
    [0] => .google.com
    [1] => .localhost
    [2] => .yahoo.com
    [3] => .bing.com
    [4] => .phpromania.net
    [5] => .commandandconquer.com
)

The first thing what induce me in the error zone is more than 2 dots in the url. Any algorithm example?


This is what I try:

    $url_list = array(
        'http://google.com',
        'http://localhost:8080/test/project/',
        'http://mail.yahoo.com',
        'http://www.bing.com',
        'http://www.phpromania.net/forum/viewtopic.php?f=24&t=27549',
        'https://prodgame10.alliances.commandandconquer.com/12/index.aspx',
    );

    function size($list)
    {
        $i=0;
        while($list[++$i]!=NULL);
        return $i;
    }

    function url_Host($list)
    {
        $listSize = size($list)-1;
        do
        {
            $strSize = size($list[$listSize]);
            $points = 0;
            $dpoints = 0;
            $tmpString = '';
            do
            {
                $currentChar = $list[$listSize][$strSize];
                if(ord('.')==ord($currentChar))
                {
                    $tmpString .= '.';
                    $points++;
                }
                else if(ord(':')==ord($currentChar))
                {
                    $tmpString .= ':';
                    $dpoints++;
                }
            }while($list[$listSize][--$strSize]!=NULL);
            print $tmpString;
            $strSize = size($list[$listSize]);
            $tmpString = '';
            do
            {
                $slice = false;
                $currentChar = $list[$listSize][$strSize];
                if($dpoints > 2)
                {
                    if(ord('\\')==ord($curentChar)) $slice = true;
                    $tmpString .= '';
                }
            }while($list[$listSize][--$strSize]!=NULL);
            print $tmpString."<br />";
        }while($list[--$listSize]);
    }

    url_Host($url_list);

Upvotes: 1

Views: 263

Answers (4)

Monir Khan
Monir Khan

Reputation: 19

We can also use array_map() with an arrow function to simplify the code. I'm refactoring @Alessandro Minoccheri's code here.

$domains = array_map(fn($url) => implode('.', array_slice(explode('.', parse_url($url, PHP_URL_HOST)), -2)),$urls);
var_dump($domains);

Upvotes: -1

Amal Murali
Amal Murali

Reputation: 76646

You can use the built-in function parse_url() as follows:

function getDomain($url) 
{
    $domain = implode('.', array_slice(explode('.', parse_url($url, PHP_URL_HOST)), -2));
    return $domain;
}

Test cases:

foreach ($url_list as $url) {
    $result[] = getDomain($url);
}

Output:

Array
(
    [0] => google.com
    [1] => localhost
    [2] => yahoo.com
    [3] => bing.com
    [4] => phpromania.net
    [5] => commandandconquer.com
    [6] => commandandconquer.ro
)

As for the dots, you can manually prepend them to string, like so:

$result[] = "." . getDomain($url);

I'm not sure why you need to do this, but this should work.

Demo!

Upvotes: 4

Alessandro Minoccheri
Alessandro Minoccheri

Reputation: 35963

First the result for localhost is no sense, but try this:

$result =array();

    foreach($url_list as $u){
       $arr = explode('//',$u);
       $arr2 = explode('.', $arr[1]);
       if($arr2[0] == 'www')
           array_push($result, $arr2[1]);
       else
           array_push($result, $arr2[0]);
    }

Upvotes: 1

Sverri M. Olsen
Sverri M. Olsen

Reputation: 13263

Look at parse_url. For example:

$url  = 'http://www.phpromania.net/forum/viewtopic.php?f=24&t=7549';
$host = parse_url($url, PHP_URL_HOST);

Upvotes: 2

Related Questions