MEM
MEM

Reputation: 31307

preg_match_all - regex expression help

I've been given the following regex expression:

 if (preg_match_all("'(http://)?(www[.])?(youtube|vimeo)[^\s]+'is",$prova,$n))
 {
     foreach ($n[3] as $key => $site)
     {
         $video_links[$site][] = $n[0][$key];
     }

However, if I have a string like:

"hello, look at my vimeo video here: http://www.vimeo.com..../ very nice hm?"

Instead of receiving only the url, I'm getting ALSO the word vimeo.

I believe the regex expression is retring more then it should and I would like to retrive ONLY the urls that it finds, not every reference of "vimeo" or "youtube".

Can I request your help in order to narrow the scope of this expression, so that only the URLs are retrieved ?

Upvotes: 0

Views: 510

Answers (2)

mario
mario

Reputation: 145482

First question mark ? in the regex is unneeded. It makes the preceeding search strings optional, thus also match the bare vimeo word in texts. Try:

preg_match_all("'(http://)(www[.])?(youtube|vimeo)[.][^\s]+'is",

Tip: add (?<![,.)]) at the end if you want to exclude typical interpunction that often screws up such url searches.


As alternative, with http:// and www. optional, but depending on presence of a path:

preg_match_all("'(http://|www[.])*(youtube|vimeo)[.]\w+/[^\s]+'is",

Upvotes: 2

The Dog
The Dog

Reputation: 507

Maybe the following code can help out a bit:

<?php
    //Test string
    $prova = "\"hello, look at my <strong>vimeo</strong> video here:  <a href=\"http://www.vimeo.com..../\" rel=\"nofollow\">http://www.vimeo.com..../</a> very nice hm?\"";
    $prova .= " vimeo vimeo.com/something?id=somethingcrazy&testing=true  ";
    //if we match then capture all matches
    if (preg_match_all("'(http://)?(www\.)?(youtube|vimeo)\.([a-z0-9_/?&+=.]+)'is",$prova,$n)){
        foreach ($n[0] as $key => $site){
            //for each match that matched the whole pattern
            //save the match as a site
            $video_links[$site][] = $n[0][$key];
        }
    }
    //display results
    print_r($video_links);
?>

This will not match the word vimeo. It will match vimeo.com/something?id=somethingcrazy&testing=true and it will match http://www.vimeo.com..../ twice.

Upvotes: 1

Related Questions