Henrik Petterson
Henrik Petterson

Reputation: 7094

Ignore parts of URL

I'm working on a simple script to scrape the channel ID of a YouTube URL.

For example, to get the channel ID on this URL:

$url = 'https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag';

I use regex:

preg_match( '/\/channel\/(([^\/])+?)$/', $url, $matches );

Works fine. But if the URL has any extra parameters or anything else after the channel ID, it doesn't work. Example:

https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag?PARAMETER=HELLO
https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag/RANDOMFOLDER
etc...

My question is, how can I adjust my regex so it works with those URLs? We don't want to match with the random parameters etc

Feel free to test my ideone code.

Upvotes: 2

Views: 398

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

You can fix the regexps in the following way:

$preg_entities        = [
        'channel_id'  => '\/channel\/([^\/?#]+)', //match YouTube channel ID from url
        'user'        => '\/user\/([^\/?#]+)',    //match YouTube user from url
    ];

See the PHP demo.

With [^\/?#]+ patterns, the regex won't go through the query string in an URL, and you will get clear values in the output.

Full code snippet:

function getYouTubeXMLUrl( $url) {
    $xml_youtube_url_base = 'h'.'ttps://youtube.com/feeds/videos.xml';
    $preg_entities        = [
        'channel_id'  => '\/channel\/([^\/?#]+)', //match YouTube channel ID from url
        'user'        => '\/user\/([^\/?#]+)',    //match YouTube user from url
    ];

    foreach ( $preg_entities as $key => $preg_entity ) {
        if ( preg_match( '/' . $preg_entity . '/', $url, $matches ) ) {
            if ( isset( $matches[1] ) ) {
                return [
                        'rss' => $xml_youtube_url_base . '?' . $key . '=' . $matches[1],
                        'id' => $matches[1],
                        'type' => $key,
                    ];
            }
        }
    }
}

Test:

$url = 'https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag?PARAMETER=HELLO';
print_r(getYouTubeXMLUrl($url));
// => Array( [rss] => https://youtube.com/feeds/videos.xml?channel_id=UCBLAoqCQyz6a0OvwXWzKZag [id] => UCBLAoqCQyz6a0OvwXWzKZag [type] => channel_id )
$url = 'https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag/RANDOMFOLDER';
print_r(getYouTubeXMLUrl($url));
// => Array( [rss] => https://youtube.com/feeds/videos.xml?channel_id=UCBLAoqCQyz6a0OvwXWzKZag [id] => UCBLAoqCQyz6a0OvwXWzKZag [type] => channel_id )

Upvotes: 1

Related Questions