Henrik Petterson
Henrik Petterson

Reputation: 7094

Regex to get YouTube URL from string

I have the following code which grabs YouTube URLs stored in a string variable:

function getVideoUrlsFromString($html) {
    $regex = '#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/)|youtu\.be\/)([a-zA-Z0-9-]*))#i';
    preg_match_all($regex, $html, $matches);
    $matches = array_unique($matches[0]);
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

$html = 'https://www.youtube-nocookie.com/embed/VWrlXsmcL2E';
$html = getVideoUrlsFromString($html);
print_r($html);

But it doesn't work with:

https://www.youtube-nocookie.com/embed/VWrlXsmcL2E
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US

Is there any way to alter the regex to work with these 2 common YouTube URLs?

Upvotes: 2

Views: 1370

Answers (2)

npinti
npinti

Reputation: 52185

The problem is that your current expression does not take into consideration the -nocookie from your first example and the ...com/v/ and extra characters in the end in your second.

You can try and change it to something like so: ((?:www\.)?(?:youtube(?:-nocookie)?\.com\/(?:v\/|watch\?v=|embed\/)|youtu\.be\/)([a-zA-Z0-9?&=_-]*)) (example here) to match the both of them.

Upvotes: 0

Alexander O'Mara
Alexander O'Mara

Reputation: 60527

Something like this should do the trick:

<?php

function getVideoUrlsFromString($html) {
    $regex = '#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/|v\/)|youtu\.be\/|youtube\-nocookie\.com\/embed\/)([a-zA-Z0-9-]*))#i';
    preg_match_all($regex, $html, $matches);
    $matches = array_unique($matches[0]);
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

$html = '
    https://www.youtube-nocookie.com/embed/VWrlXsmcL2E
    http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
';
$html = getVideoUrlsFromString($html);
print_r($html);

Output:

Array
(
    [0] => www.youtube-nocookie.com/embed/VWrlXsmcL2E
    [1] => www.youtube.com/v/NLqAF9hrVbY
)

Here's a diff of the two to see what was added:

#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/    )|youtu\.be\/                                )([a-zA-Z0-9-]*))#i
#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/|v\/)|youtu\.be\/|youtube\-nocookie\.com\/embed\/)([a-zA-Z0-9-]*))#i

Upvotes: 2

Related Questions