Reputation: 16142
I'm trying to convert this python regex to a javascript regex
r"""(?x)^
(
(?:https?://|//)? # http(s):// or protocol-independent URL (optional)
(?:(?:(?:(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com/|
(?:www\.)?deturl\.com/www\.youtube\.com/|
(?:www\.)?pwnyoutube\.com/|
(?:www\.)?yourepeat\.com/|
tube\.majestyc\.net/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID:
(?:(?:v|embed|e)/) # v/ or embed/ or e/
|(?: # or the v= param in all its forms
(?:(?:watch|movie)(?:_popup)?(?:\.php)?/?)? # preceding watch(_popup|.php) or nothing (like /?v=xxxx)
(?:\?|\#!?) # the params delimiter ? or # or #!
(?:.*?&)? # any other preceding param (like /?s=tuff&v=xxxx)
v=
)
))
|youtu\.be/ # just youtu.be/xxxx
|https?://(?:www\.)?cleanvideosearch\.com/media/action/yt/watch\?videoId=
)
)? # all until now is optional -> you can pass the naked ID
([0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
(?(1).+)? # if we found the ID, everything can follow
$"""
I removed the quotes at start and end, added start /^
and end delimiters /i
, escaped forward slashes, removed the free-spacing mode and ended up with this
var VALID_URL = /^((?:https?:\/\/|\/\/)?(?:(?:(?:(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com\/|(?:www\.)?deturl\.com\/www\.youtube\.com\/|(?:www\.)?pwnyoutube\.com\/|(?:www\.)?yourepeat\.com\/|tube\.majestyc\.net\/|youtube\.googleapis\.com\/)(?:.*?\#\/)?(?:(?:(?:v|embed|e)\/)|(?:(?:(?:watch|movie)(?:_popup)?(?:\.php)?\/?)?(?:\?|\#!?)(?:.*?&)?v=)))|youtu\.be\/|https?:\/\/(?:www\.)?cleanvideosearch\.com\/media\/action\/yt\/watch\?videoId=))?([0-9A-Za-z_-]{11})(?(1).+)?$/g;
However the javascript regex debugger I'm using says Unexpected character "(" after "?"
in regards to the javascript transpose of this part of the python regex
(?(1).+)? # if we found the ID, everything can follow
Any idea how I can resolve this error?
Upvotes: 0
Views: 107
Reputation: 41838
JavaScript does not support conditionals.
But the world of regex has long survived without conditionals, and there are ways around it.
The Idea
The basic structure of that scary regex was this:
(Capture A)? (Match B) ( If A was captured, (Match C)? )
You can translate the IF into an OR:
(Capture A) (Match B) (Match C)? **OR** (Match B)
Converted Regex
Try this:
^((?:https?://|//)?(?:(?:(?:(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com/|(?:www\.)?deturl\.com/www\.youtube\.com/|(?:www\.)?pwnyoutube\.com/|(?:www\.)?yourepeat\.com/|tube\.majestyc\.net/|youtube\.googleapis\.com/)(?:[^\n]*?#/)?(?:(?:(?:v|embed|e)/)|(?:(?:(?:watch|movie)(?:_popup)?(?:\.php)?/?)?(?:\?|#!?)(?:[^\n]*?&)?v=)))|youtu\.be/|https?://(?:www\.)?cleanvideosearch\.com/media/action/yt/watch\?videoId=)([0-9A-Za-z_-]{11})(?:[^\n]+)?)|^([0-9A-Za-z_-]{11})
Explanation
The (?(1)[^\n]+)?
conditional tries to optionally match [^\n]+
if Group 1 is set. Since it occurs after the non-optional ([0-9A-Za-z_-]{11})
, I transformed the conditional into an alternation |
([0-9A-Za-z_-]{11})
and the optional component, OR([0-9A-Za-z_-]{11})
([0-9A-Za-z_-]{11})
, depending on which side of the alternation matches it, it will live inside a different capture Group. I'll leave you to count the parentheses.Reference
Upvotes: 1