Leo Starić
Leo Starić

Reputation: 331

REGEXP in PHP to catch specific domain links

So I'm working on a regexp to catch all links in a string, meaning wordsthat start with with a protocol like http, https etc, words that start with www. or words that end in some specific domains, ".com", ".hr" and ".net". But somehow this regexp I made always returns all the links that start with a protocol, but only the last one of those that end in a specific domain. What am I doing wrong :|? Many thanks!

$description='test.com test2.hr http://www.test3.hr https://test4.com test3.net';
$pattern = '/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&@#\/%=~_|$?!:,.]*[A-Z0-9+&@#\/%=~_|$]|(?:\b((?:[\w]+\.com$)|(?:[\w]+\.hr$)|(?:[\w]+\.net$)))/i';
preg_match_all($pattern, $description, $out);
var_dump($out[0]);

Upvotes: 0

Views: 71

Answers (1)

Chris Brendel
Chris Brendel

Reputation: 700

There are a few problems with your original regex. First, you should be treating the protocol with the conditional modifier ?. I'm not sure why you're using the second block of [A-Z0-9+&@#\/%=~_|$] or why you're using the | operator after that; if there's a specific reason, please let me know. Finally, $ only works as end-of-string when you use it at the very end of the regex; otherwise, you should use \Z, which matches end-of-string at any point in the regex, although I don't think you want to be matching end-of-string in here anyway. I've rewritten the regex below in the way I think you want it to work:

$description='test.com test2.hr http://www.test3.hr https://test4.com test3.net trash string don\'t match test4.net';
$pattern = '/(?:(?:https?|ftp|file):\/\/(?:www|ftp)\.)?[-A-Z0-9+&@#\/%=~_|$?!:,.]*(\.[A-Z]+)/i';
preg_match_all($pattern, $description, $out);
var_dump($out[0]);

returns:

array(6) {
  [0]=>
  string(8) "test.com"
  [1]=>
  string(8) "test2.hr"
  [2]=>
  string(19) "http://www.test3.hr"
  [3]=>
  string(17) "https://test4.com"
  [4]=>
  string(9) "test3.net"
  [5]=>
  string(9) "test4.net"
}

Upvotes: 1

Related Questions