Grant
Grant

Reputation: 1337

Regex only for specific domain name in URL

As much as I've tried I can't seem to find the correct regex to locate what I'm after here.

I only want to select the first instance of the url that matches the domain www.myweb.com from the following...

Some text https://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr

I need to completely ignore the second url www.adifferentsite.com and only work with the first one that matches www.myweb.com, ignoring any other possible instances of www.myweb.com

Once the first matching domain is discovered I need to store the rest of the url that comes after it...

page/cat/323123442321-rghe432

...into a new variable $newvar, so...

$newvar = 'page/cat/323123442321-rghe432';

I'm trying :

return preg_replace_callback( '/http://www.myweb.com/\/[0-9a-zA-Z]+/', array( __CLASS__, 'my_callback' ), $newvar );

I've read tons of documents on how to detect url's but can't find anything about detecting a specific url.

I really can't grasp how to formulate regex so this formula is incorrect. Any help would be greatly appreciated.

EDIT Edited the question to be a bit more specific and hopefully a bit easier to resolve.

Upvotes: 2

Views: 1650

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627410

You can use a preg_replace_callback and pass an array into the anonymous function (or just your custom callback function) to fill it with all the necessary URL parts.

Here is a demo:

$rests = array();
$re = '~\b(https?://)www\.myweb\.com/(\S+)~'; 
$str = "Some text https://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr"; 
echo $result = preg_replace_callback($re, function ($m) use (&$rests) {
    array_push($rests, $m[2]);
    return $m[1] . "embed.myweb.com/" . $m[2];
}, $str) . PHP_EOL;
print_r($rests);

Results:

Some text https://embed.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr
Array
(
    [0] => page/cat/323123442321-rghe432
)

A couple of words:

  • '~\b(https?://)www\.myweb\.com/(\S+)~' has ~ as a regex delimiter, so you do not have to escape /
  • It is declared with a single-quoted literal, so you do not have to use double-escaping for \\S
  • It matches and captures into capturing groups 2 substrings: \b(https?://) (that matches a whole word http or https followed by ://) and (\S+) (that matches 1 or more non-whitespace characters). These capturing groups are marked with (...) in the pattern and can be accessed via $matches[n] where n is the id of the capturing group.

UPDATE

If you only need to replace the first occurrence of the URL, pass the limit argument to the preg_replace_callback:

$rest = "";
$re = '~\b(https?://)www\.myweb\.com/(\S+\b)~'; 
$str = "Some text https://www.myweb.com/page/cat/323123442321-rghe432, another http://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr"; 
echo $result = preg_replace_callback($re, function ($m) use (&$rest) {
    $rest = $m[2];
    return $m[1] . "embed.myweb.com/" . $m[2];
}, $str, 1) . PHP_EOL;
//-LIMIT ^ - HERE -
echo $rest;

See another IDEONE demo

Upvotes: 2

Related Questions