Preg_split matching more than what it should

Question

Code:

    $pattern = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
    $urls = array();
    preg_match($pattern, $comment, $urls);

    return $urls;

According to an online regex tester, this regex is correct and should be working:

http://regexr.com?35nf9

I am outputting the $links array using:

$linkItems = $model->getLinksInComment($model->comments);
//die(print_r($linkItems));
echo '';
foreach($linkItems as $link) {
    echo ''.$link.'';
}
echo '';

The output looks like the following:

http://google.com
http

The $model->comments looks like the following:

destined for surplus
RT#83015
RT#83617
http://google.com
https://google.com
non-link

The list generated is only suppose to be links, and there should be no lines that are empty. Is there something wrong with what I did, because the Regex seems to be correct.

user428517 · Accepted Answer

If I'm understanding right, you should use preg_match_all in your getLinksInComment function instead:

preg_match_all($pattern, $comment, $matches);

if (isset($matches[0])) {
    return $matches[0];
}
return array();    #in case there are no matches

preg_match_all gets all matches in a string (even if the string contains newlines) and puts them into the array you supply as the third argument. However, anything matched by your regex's capture groups (e.g. (http|https|ftp|ftps)) will also be put into your $matches array (as $matches[1] and so on). That's why you want to return just $matches[0] as your final array of matches.

I just ran this exact code:

$line = "destined for surplus

RT#83015

RT#83617

http://google.com

https://google.com

non-link";

$pattern = "/(http|https|ftp|ftps)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?/";
preg_match_all($pattern, $line, $matches);

var_dump($matches);

and got this for my output:

array(3) {
  [0]=>
  array(2) {
    [0]=>
    string(17) "http://google.com"
    [1]=>
    string(18) "https://google.com"
  }
  [1]=>
  array(2) {
    [0]=>
    string(4) "http"
    [1]=>
    string(5) "https"
  }
  [2]=>
  array(2) {
    [0]=>
    string(0) ""
    [1]=>
    string(0) ""
  }
}

Preg_split matching more than what it should

Answers (2)

Related Questions