Reputation: 7104

Grab URL within a string which contains HTML code

I have a string, for example:

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';

And I want to search the string for the first URL that starts with youtube.com or youtu.be and store it in variable $first_found_youtube_url.

How can I do this efficiently?

I can do a preg_match or strpos looking for the urls but not sure which approach is more appropriate.

Upvotes: 3

Answers (3)

hanshenrik

Reputation: 21675

you can parse the html with DOMDocument and look for youtube url's with stripos, something like this

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$DOMD = @DOMDocument::loadHTML($html);

foreach($DOMD->getElementsByTagName("a") as $url)
{
    if (0 === stripos($url->getAttribute("href") , "https://www.youtube.com/") || 0 === stripos($url->getAttribute("href") , "https://www.youtu.be"))
    {
        $first_found_youtube_url = $url->getAttribute("href");
        break;
    }
}

personally, i would probably use

"youtube.com"===parse_url($url->getAttribute("href"),PHP_URL_HOST)

though, as it would get http AND https links.. which is probably what you want, though strictly speaking, not what you're asking for in top post right now..

Upvotes: 1

the_velour_fog

Reputation: 2184

I think this will do what you are looking for, I have used preg_match_all simply because I find it easier to debug the regexes.

<?php

$html = '<p>hello<a href="https://www.youtu.be/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';

$pattern = '/https?:\/\/(www\.)?youtu(\.be|\com)\/[a-zA-Z0-9\?=]*/i';
preg_match_all($pattern, $html, $matches);

// print_r($matches);
$first_found_youtube = $matches[0][0];
echo $first_found_youtube;

demo - https://3v4l.org/lFjmK

Upvotes: 0

skrilled

Reputation: 5371

I wrote this function a while back, it uses regex and returns an array of unique urls. Since you want the first one, you can just use the first item in the array.

function getUrlsFromString($string) {
    $regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#i';
    preg_match_all($regex, $string, $matches);
    $matches = array_unique($matches[0]);           
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

Example:

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getUrlsFromString($html);
$first_found_youtube = $urls[0];

With YouTube specific regex:

function getYoutubeUrlsFromString($string) {
    $regex = '#(https?:\/\/(?:www\.)?(?:youtube.com\/watch\?v=|youtu.be\/)([a-zA-Z0-9]*))#i';
    preg_match_all($regex, $string, $matches);
    $matches = array_unique($matches[0]);           
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

Example:

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getYoutubeUrlsFromString($html);
$first_found_youtube = $urls[0];

Upvotes: 4

Grab URL within a string which contains HTML code

Answers (3)

Related Questions