Luke
Luke

Reputation: 23680

Validating Youtube URL using Regex

I'm trying to validate YouTube URLs for my application.

So far I have the following:

// Set the youtube URL
$youtube_url = "www.youtube.com/watch?v=vpfzjcCzdtCk";

if (preg_match("/((http\:\/\/){0,}(www\.){0,}(youtube\.com){1} || (youtu\.be){1}(\/watch\?v\=[^\s]){1})/", $youtube_url) == 1)
{
    echo "Valid";
else
{
    echo "Invalid";
}

I wish to validate the following variations of Youtube Urls:

However, I don't think I've got my logic right, because for some reason it returns true for: www.youtube.co/watch?v=vpfzjcCzdtCk (Notice I've written it incorrectly with .co and not .com)

Upvotes: 18

Views: 28289

Answers (6)

Kaligula
Kaligula

Reputation: 39

If you'd like to cover all YouTube URL variants try this:

^(?:(?:https?:)?\/\/)?(?:(?:(?:www|m(?:usic)?)\.)?youtu(?:\.be|be\.com)\/(?:shorts\/|live\/|v\/|e(?:mbed)?\/|watch(?:\/|\?(?:\S+=\S+&)*v=)|oembed\?url=https?%3A\/\/(?:www|m(?:usic)?)\.youtube\.com\/watch\?(?:\S+=\S+&)*v%3D|attribution_link\?(?:\S+=\S+&)*u=(?:\/|%2F)watch(?:\?|%3F)v(?:=|%3D))?|www\.youtube-nocookie\.com\/embed\/)([\w-]{11})[\?&#]?\S*$

It's a RegExp from a related question for any known YouTube URL (also music.*, shorts/, live/, e/ embed/, v/, *-nocookie etc.). Also catches video ID.

If you want you can restrict the video ID further with Glenn's answer instead of ([\w-]{11}).

Upvotes: 0

Steven Moseley
Steven Moseley

Reputation: 16325

This should do it:

$valid = preg_match("/^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.be)\/watch\?v\=\w+$/", $youtube_url);
if ($valid) {
    echo "Valid";
} else {
    echo "Invalid";
}

Upvotes: 3

Linus Kleen
Linus Kleen

Reputation: 34632

There are a lot of redundancies in this regular expression of yours (and also, the leaning toothpick syndrome). This, though, should produce results:

$rx = '~
  ^(?:https?://)?                           # Optional protocol
   (?:www[.])?                              # Optional sub-domain
   (?:youtube[.]com/watch[?]v=|youtu[.]be/) # Mandatory domain name (w/ query string in .com)
   ([^&]{11})                               # Video id of 11 characters as capture group 1
    ~x';

$has_match = preg_match($rx, $url, $matches);

// if matching succeeded, $matches[1] would contain the video ID

Some notes:

  • use the tilde character ~ as delimiter, to avoid LTS
  • use [.] instead of \. to improve visual legibility and avoid LTS. ("Special" characters - such as the dot . - have no effect in character classes (within square brackets))
  • to make regular expressions more "readable" you can use the x modifier (which has further implications; see the docs on Pattern modifiers), which also allows for comments in regular expressions
  • capturing can be suppressed using non-capturing groups: (?: <pattern> ). This makes the expression more efficient.

Optionally, to extract values from a (more or less complete) URL, you might want to make use of parse_url():

$url = 'http://youtube.com/watch?v=VIDEOID';
$parts = parse_url($url);
print_r($parts);

Output:

Array
(
    [scheme] => http
    [host] => youtube.com
    [path] => /watch
    [query] => v=VIDEOID
)

Validating the domain name and extracting the video ID is left as an exercise to the reader.


I gave in to the comment war below; thanks to Toni Oriol, the regular expression now works on short (youtu.be) URLs as well.

Upvotes: 39

Glenn Slayden
Glenn Slayden

Reputation: 18749

I defer to the other answers on this page for parsing the URL syntax, but for the YouTube ID values themselves, you can be a little bit more specific, as I describe in the following answer on StackExchange/WebApps:

Format for ID of YouTube video   -    https://webapps.stackexchange.com/a/101153/141734


Video Id

For the videoId, it is an 8-byte (64-bit) integer. Applying Base64-encoding to 8 bytes of data requires 11 characters. However, since each Base64 character conveys exactly 6 bits, this allocation could actually hold up to 11 × 6 = 66 bits--a surplus of 2 bits over what our payload needs. The excess bits are set to zero, which has the effect of excluding certain characters from ever appearing in the last position of the encoded string. In particular, the videoId will always end with one of the following:

{ A, E, I, M, Q, U, Y, c, g, k, o, s, w, 0, 4, 8 }

Thus, a regular expression (RegEx) for the videoId would be as follows:

[-_A-Za-z0-9]{10}[AEIMQUYcgkosw048]

Channel or Playlist Id

The channelId and playlistId strings are produced by Base64-encoding a 128-bit (16-byte) binary integer. Again here, calculation per Base64 correctly predicts the observed string length of 22-characters. In this case, the output is capable of encoding 22 × 6 = 132 bits, a surplus of 4 bits; those zeros end up restricting most of the 64 alphabet symbols from appearing in the last position, and only 4 remain eligible. All channelId strings end in one of the following:

{ A, Q, g, w }

This gives us the regular expression for a channelId:

[-_A-Za-z0-9]{21}[AQgw]

Upvotes: 4

Jason McCreary
Jason McCreary

Reputation: 72991

An alternative to Regular Expressions would be parse_url().

 $parts = parse_url($url);
 if ($parts['host'] == 'youtube.com' && ...) {
   // your code
 }

While it is more code, it is more readable and therefore more maintainable.

Upvotes: 5

eisberg
eisberg

Reputation: 3771

Please try:

// Set the youtube URL
$youtube_url = "www.youtube.com/watch?v=vpfzjcCzdtCk";

if (preg_match("/^((http\:\/\/){0,}(www\.){0,}(youtube\.com){1}|(youtu\.be){1}(\/watch\?v\=[^\s]){1})$/", $youtube_url) == 1)
{
    echo "Valid";
}
else
{
    echo "Invalid";
}

You had || which is ok without ^$ in any case.

Upvotes: 4

Related Questions