CBeTJlu4ok
CBeTJlu4ok

Reputation: 1112

filter youtube links from content with Regex

I have a input area where people post updates. So I want to filter youtube links, modify them and append them in the end.

This content is not html, it even does not have <br> or <p>, it's just pure string.

Here is the code I've got from different part of program.

What this should do is, take all matches, and replace them with html.

function aKaFilter( $content ) {
    global $bp;

    $pattern2 = '#^(?:https?://)?(?:www\.)?(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})(?:.+)?$#x';
    preg_match_all( $pattern2, $content, $youtubes );
    if ( $youtubes ) {
        /* Make sure there's only one instance of each video */
        if ( !$youtubes = array_unique( $youtubes[1] ) )
            return $content;

        //but we need to watch for edits and if something was already wrapped in html link - thus check for space or word boundary prior
        foreach( (array)$youtubes as $youtube ) {
            $pattern = "NEW". $youtube ."PATTERN TO MATCH THIS LINK";
            $content = preg_replace( $pattern, '<span class="video youtube" data-trigger="'.$youtube.'"><img src="http://img.youtube.com/vi/'.$youtube.'/0.jpg"><span class="icon-stack"><i class="icon-circle icon-stack-base"></i><i class="icon-youtube-play"></i></span><span>title</span></span>', $content );
        }
    }

    return $content;
}

here is a original code:

function etivite_bp_activity_hashtags_filter( $content ) {
global $bp;

//what are we doing here? - same at atme mentions
//$pattern = '/[#]([_0-9a-zA-Z-]+)/';
$pattern = '/(?(?<!color: )(?<!color: )[#]([_0-9a-zA-Z-]+)|(^|\s|\b)[#]([_0-9a-zA-Z-]+))/';

preg_match_all( $pattern, $content, $hashtags );
if ( $hashtags ) {
    /* Make sure there's only one instance of each tag */
    if ( !$hashtags = array_unique( $hashtags[1] ) )
        return $content;

    //but we need to watch for edits and if something was already wrapped in html link - thus check for space or word boundary prior
    foreach( (array)$hashtags as $hashtag ) {
        $pattern = "/(^|\s|\b)#". $hashtag ."($|\b)/";
        $content = preg_replace( $pattern, ' <a href="' . $bp->root_domain . "/" . $bp->activity->slug . "/". BP_ACTIVITY_HASHTAGS_SLUG ."/" . htmlspecialchars( $hashtag ) . '" rel="nofollow" class="hashtag">#'. htmlspecialchars( $hashtag ) .'</a>', $content );
    }
}

return $content;
}

what it does is, it takes textarea, and instead of #hash it replaces with <a>#hash</a> hashtags like you see in social media.

what I want my function to do, is to take youtube links and convert it to <a>ID</a> (basically)

It works fine If I have only youtube link, but when it's with string after or before it, it just goes crazy.

I guess it does not work because I didn't came up with second $pattern. which was there in other program.

Upvotes: 0

Views: 1047

Answers (4)

Suman
Suman

Reputation: 134

try using url :

result in JSON format. http://gdata.youtube.com/feeds/mobile/videos?alt=json&q=music&format=1,5,6

result in xml format http://gdata.youtube.com/feeds/mobile/videos?q=music&format=1,5,6

Then, for xml format use regular expression on -- tag:youtube.com,2008:video:qycqF1CWcXg and retrieve video ID i.e. "qycqF1CWcXg" in this example

Same steps applicable for JSON format.

Upvotes: 0

polkduran
polkduran

Reputation: 2551

The problem when trying to match URLs using regexes withing a text is that you can't know when the URL ends.

URLs can contain 'spaces', ., , and other characters, so you can't say that the URL ends when a new word begins or when a sentence ends. Besides, the end of your regex (?:.+)? will match (almost) everything.

If you make the assumption that a yutube URL can not contain white spaces (after a given position/index of the URL), you can change the end of your regex by (?:[^\s]+)? (all but white spaces), you can add other characters to the set in order to define the end of your URL, for example if the URL must not contain , either, you do (?:[^\s,]+)?, and so on.

Then, you set beginning and ending anchors on your regex (^ and $). That may not work when your URL is surrounded by some text, so you can remove those anchors and add the \b (word boundary) anchor at the beginning of your regex.

By the way, you can replace (?:.+)? by .* and (?:[^\s,]+)? by `[^\s,]*

You now have a regex like that : '#\b(?:https?://)?(?:www\.)?(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})[^\s,]*#x'

NB. I did not analyze all the logic of your regex, so my comments only worth for the beginning and ending of your regex.

Upvotes: 1

Tyron
Tyron

Reputation: 1976

Why do you need preg_replace()? str_replace() in your case should suffice. Also you probably need to iterate over $youtubes[0], not $youtubes. Plus simplify your code! ;-)

Ergo this should work:

function aKaFilter( $content ) {
    global $bp;

    $pattern2 = '#^(?:https?://)?(?:www\.)?(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})(?:.+)?$#x';
    preg_match_all( $pattern2, $content, $youtubes );

    /* Make sure there's only one instance of each video */
    $youtubes = array_unique( $youtubes[1] );

    if ( $youtubes ) {

        //but we need to watch for edits and if something was already wrapped in html link - thus check for space or word boundary prior
        foreach( $youtubes[0] as $youtube ) {

            $content = str_replace( $youtube, '<span class="video youtube" data-trigger="'.$youtube.'"><img src="http://img.youtube.com/vi/'.$youtube.'/0.jpg"><span class="icon-stack"><i class="icon-circle icon-stack-base"></i><i class="icon-youtube-play"></i></span><span>title</span></span>', $content );
        }
    }

    return $content;
}

Upvotes: 1

Michael Hampton
Michael Hampton

Reputation: 10000

Don't use a regex for this at all, use parse_url.

For instance:

$parsed_url = parse_url($content);
if (in_array($parsed_url['host'], array('www.youtube.com', 'youtube.com', 'www.youtube-nocookie.com', 'youtube-nocookie.com'))) {
    ## Now look through $parsed_url['query'] for the video ID
    ## Parsing this out is a separate question :)
}

Upvotes: 1

Related Questions