madkris24
madkris24

Reputation: 493

Regex for dropping http:// and www. from URLs

I have a bunch of urls like these.

  $urls = array(
    'https://site1.com',
    'https://www.site2.com',
    'http://www.site3.com',
    'https://site4.com',
    'site5.com',
    'www.site6.com',
    'www.site7.co.uk',
    'site8.tk'
  );

I wanted to remove the http, https, :// and www. from these strings so that the output will look like these.

  $urls = array(
    'site1.com',
    'site2.com',
    'site3.com',
    'site4.com',
    'site5.com',
    'site6.com',
    'site7.co.uk',
    'site8.tk'
  );

I came up with this solution.

foreach ($urls as $url) {
   $pattern = '/(http[s]?:\/\/)?(www\.)?/i';
   $replace = "";
   echo "before: $url after: ".preg_replace('/\/$/', '', preg_replace($pattern, $replace, $url))."\n";
}

I was wondering how I could avoid the second preg_replace. Any ideas?

Upvotes: 3

Views: 4245

Answers (4)

NikiC
NikiC

Reputation: 101926

Depending on what exactly it is you want to do, it might be better to stick with PHP's own URL parsing facilities, namely parse_url:

foreach ($urls as &$url) {
    $url = preg_replace('~^www.~', '', parse_url($url, PHP_URL_HOST));
}
unset($url);

parse_url will give you the host of the URL, even if it will contain a port number or HTTP authentication data. (Whether this is what you need, depends on your exact use case though.)

Upvotes: 0

Alix Axel
Alix Axel

Reputation: 154543

Short and sweet:

$urls = preg_replace('~^(?:https?://)?(?:www[.])?~i', '', $urls);

Upvotes: 0

Paul
Paul

Reputation: 141829

preg_replace can also take an array, so you don't even need the loop. You can do this with a one liner:

$urls = preg_replace('/(?:https?:\/\/)?(?:www\.)?(.*)\/?$/i', '$1', $urls);

Upvotes: 14

sidyll
sidyll

Reputation: 59287

/^(https?:\/\/)?(www\.)?(.*)\/$/i

And use what's on $3. Or, even better, change the first two parentheses to the non-capturing version (?:) and use what's on 1.

Upvotes: 13

Related Questions