user5164720
user5164720

Reputation:

Regex: Scrub a YouTube URL within a string, leaving only the YouTube video code

I have a text that contains a YouTube URL. I need to remove all portions of the link, except for the YouTube video code. The URL may be surrounded by blank space or nothing; no non-blank characters will adjoin the URL.

SAMPLE:

$txt = "This text contain this link: https://www.youtube.com/watch?v=b8ri14rw32c&rel=0 and so on..."

EXTRACTING ID:

$pattern = '#(?<=v=|v\/|vi=|vi\/|youtu.be\/)[a-zA-Z0-9_-]{11}#';
preg_match_all($pattern, $txt, $matches);
print_r($matches);

EXPECTED:

Array
(
    [0] = "This text contain this link b8ri14rw32c and so on..."
)

Upvotes: 1

Views: 846

Answers (2)

Michael Gaskill
Michael Gaskill

Reputation: 8042

You can try this pattern to match:

https:\/\/(?:www.)?youtu(?:be\.com|\.be)\/(?:watch\?vi?[=\/])?(\w{11})(?:&\w+=[^&\s]*)*

There is exactly one capture in this expression, and it's for the YouTube video code. This capture can be used with a regex replace to replace the entire link text with just the captured video code.

This regex will work with these format YouTube URLs:

https://www.youtube.com/watch?v=b8ri14rw32c&rel=0
https://youtu.be/Rk_sAHh9s08

Other YouTube URL formats have not been tested, but could easily be supported if needed.

This PHP code will test this regexp replacement using preg_replace:

$txt = "This text contain this link: https://www.youtube.com/watch?v=b8ri14rw32c&rel=0 and so on...";
$pattern = "/https:\/\/(?:www.)?youtu(?:be\.com|\.be)\/(?:watch\?vi?[=\/])?(\w{11})(?:&\w+=[^&\s]*)*/";
$text = preg_replace($pattern, '$1', $txt);

Upvotes: 2

James Buck
James Buck

Reputation: 1640

If I understood you correctly, the following should work for normal YouTube links (unshortened).

https?:\/\/[^\s]+[?&]v=([^&\s]+)[^\s]*

Replace with \1
(Capturing group 1)

Regex demo.

Upvotes: 2

Related Questions