Chamilyan
Chamilyan

Reputation: 9423

Youtube I.D parsing for new URL formats

This question has been asked before and I found this:

Reg exp for youtube link

but I'm looking for something slightly different.

I need to match the Youtube I.D itself compatible with all the possible youtube link formats. Not exclusively beginning with youtube.com.

For example:

http://www.youtube.com/watch?v=-wtIMTCHWuI

http://www.youtube.com/v/-wtIMTCHWuI?version=3&autohide=1

http://youtu.be/-wtIMTCHWuI

http://www.youtube.com/oembed?url=http%3A//www.youtube.com/watch?v%3D-wtIMTCHWuI&format=json

http://s.ytimg.com/yt/favicon-wtIMTCHWuI.ico

http://i2.ytimg.com/vi/-wtIMTCHWuI/hqdefault.jpg

is there a clever strategy I can use to match the video I.D -wtIMTCHWuI compatible with all these formats. I'm thinking character counting and matching = ? / . & characters.

Upvotes: 21

Views: 19403

Answers (5)

Nic
Nic

Reputation: 1

I don't know if this is what your looking for, but I found this great list of YouTube URLs (GitHub).

Some of the URLs in the list are for proxy services and attribution links. In my use case, the supplied string can either be a URL or just an ID - so a regex match wont do.

So, based on all of the possibilities here, the ID can be extracted with two regex replaces:

^.+(\/|vi?=|v%3D)

In all sampled cases this selects everything between the start of the string and the start of the ID. Here is a demo RegExr

[^a-zA-Z0-9_\-].+$

Additionally, in all sampled cases, this selects everything from the end of the ID (in the now truncated string) to the end of the string. Here is a demo RegExr

Upvotes: 0

Shibizle
Shibizle

Reputation: 21

It's a bit late, but I wrote this regex today and it does not only identify the links but returns the video_id via match-group 6

^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.?be)(\/)?(watch\?v=|\?v=)?(.*)$

https://gist.github.com/Shibizle/3c6707911ea716860786728d31f8e3e5

Test it: https://regex101.com/r/l0m7yh/1

Picture: Regex YouTube

Upvotes: 2

eyecatchUp
eyecatchUp

Reputation: 10560

I had to deal with this for a PHP class I wrote a few weeks ago and ended up with a regex that matches any kind of strings: With or without URL scheme, with or without subdomain, youtube.com URL strings, youtu.be URL strings and dealing with all kind of parameter sorting. You can check it out at GitHub or simply copy and paste the code block below:

/**
 *  Check if input string is a valid YouTube URL
 *  and try to extract the YouTube Video ID from it.
 *  @author  Stephan Schmitz <[email protected]>
 *  @param   $url   string   The string that shall be checked.
 *  @return  mixed           Returns YouTube Video ID, or (boolean) false.
 */
function parse_yturl($url)
{
    $pattern = '#^(?:https?://|//)?(?:www\.|m\.)?(?:youtu\.be/|youtube\.com/(?:embed/|v/|watch\?v=|watch\?.+&v=))([\w-]{11})(?![\w-])#';
    preg_match($pattern, $url, $matches);
    return (isset($matches[1])) ? $matches[1] : false;
}

Test cases: https://3v4l.org/GEDT0
JavaScript version: https://stackoverflow.com/a/10315969/624466

To explain the regex, here's a split up version:

/**
 *  Check if input string is a valid YouTube URL
 *  and try to extract the YouTube Video ID from it.
 *  @author  Stephan Schmitz <[email protected]>
 *  @param   $url   string   The string that shall be checked.
 *  @return  mixed           Returns YouTube Video ID, or (boolean) false.
 */
function parse_yturl($url)
{
    $pattern = '#^(?:https?://|//)?' # Optional URL scheme. Either http, or https, or protocol-relative.
             . '(?:www\.|m\.)?'      #  Optional www or m subdomain.
             . '(?:'                 #  Group host alternatives:
             .   'youtu\.be/'        #    Either youtu.be,
             .   '|youtube\.com/'    #    or youtube.com
             .     '(?:'             #    Group path alternatives:
             .       'embed/'        #      Either /embed/,
             .       '|v/'           #      or /v/,
             .       '|watch\?v='    #      or /watch?v=,
             .       '|watch\?.+&v=' #      or /watch?other_param&v=
             .     ')'               #    End path alternatives.
             . ')'                   #  End host alternatives.
             . '([\w-]{11})'         # 11 characters (Length of Youtube video ids).
             . '(?![\w-])#';         # Rejects if overlong id.
    preg_match($pattern, $url, $matches);
    return (isset($matches[1])) ? $matches[1] : false;
}

Upvotes: 50

Attila Fulop
Attila Fulop

Reputation: 7011

Currently I'm using this:

function _getYoutubeVideoId($url)
{
  $parts = parse_url($url);

  //For seriously malformed urls
  if ($parts === false) {
     return false;
  }

  switch ($parts['host']) {
     case 'youtu.be':
        return substr($parts['path'], 1);
        break;
     case 'youtube.com':
     case 'www.youtube.com':
        parse_str($parts['query'], $params);
        return $params['v'];
        break;
     default:
        return false;
        break;
  } 
}

It could be extended, but right now it works for most of the cases

Upvotes: 0

Nick Budden
Nick Budden

Reputation: 641

I found this code this link:

<?php 
/** 
 *  parse_youtube_url() PHP function 
 *  Author: takien 
 *  URL: http://takien.com 
 *  
 *  @param  string  $url    URL to be parsed, eg:  
 *                            http://youtu.be/zc0s358b3Ys,  
 *                            http://www.youtube.com/embed/zc0s358b3Ys
 *                            http://www.youtube.com/watch?v=zc0s358b3Ys 
 *  @param  string  $return what to return 
 *                            - embed, return embed code 
 *                            - thumb, return URL to thumbnail image
 *                            - hqthumb, return URL to high quality thumbnail image.
 *  @param  string     $width  width of embeded video, default 560
 *  @param  string  $height height of embeded video, default 349
 *  @param  string  $rel    whether embeded video to show related video after play or not.

 */  

 function parse_youtube_url($url,$return='embed',$width='',$height='',$rel=0){ 
    $urls = parse_url($url); 

    //expect url is http://youtu.be/abcd, where abcd is video iD
    if($urls['host'] == 'youtu.be'){  
        $id = ltrim($urls['path'],'/'); 
    } 
    //expect  url is http://www.youtube.com/embed/abcd 
    else if(strpos($urls['path'],'embed') == 1){  
        $id = end(explode('/',$urls['path'])); 
    } 
     //expect url is abcd only 
    else if(strpos($url,'/')===false){ 
        $id = $url; 
    } 
    //expect url is http://www.youtube.com/watch?v=abcd 
    else{ 
        parse_str($urls['query']); 
        $id = $v; 
    } 
    //return embed iframe 
    if($return == 'embed'){ 
        return '<iframe width="'.($width?$width:560).'" height="'.($height?$height:349).'" src="http://www.youtube.com/embed/'.$id.'?rel='.$rel.'" frameborder="0" allowfullscreen>'; 
    } 
    //return normal thumb 
    else if($return == 'thumb'){ 
        return 'http://i1.ytimg.com/vi/'.$id.'/default.jpg'; 
    } 
    //return hqthumb 
    else if($return == 'hqthumb'){ 
        return 'http://i1.ytimg.com/vi/'.$id.'/hqdefault.jpg'; 
    } 
    // else return id 
    else{ 
        return $id; 
    } 
} 
?>

I'm dealing with this too so if you find a better solution please let me know. It doesn't quite do what you need for images out of the box but it could be easily adapted.

Upvotes: 3

Related Questions