Escape quotation mark in regex expression

I'm trying to extract an URL to download by using regular expression but I cannot deal with the quotation marks in the lookbehind and the positive lookahead.

Can you fix it?

Input:document.getElementsByClassName('mdui-textfield-input')[1].innerHTML

Output:"<video><source src=\"https://drivebutler.drk1.workers.dev/0:/Cartoon%20Collection/Naruto%20Shippuden%20(Complete%20Series%20001-500)%20Naruto%20Shippuuden%20[1080p]%20[HEVC]%20[x265]%20[Batch]%20[pseudo]/Season%2015%20(Episodes%20321-348)/[AnimeRG]%20Naruto%20Shippuden%20-%20338%20[1080p]%20[x265]%20[pseudo].mkv\" type=\"video/mp4\"></video>"

The regex I use to grab the url,

(?<=src=\\\").*?(?=\\\")

What I've tried,

document.getElementsByClassName('mdui-textfield-input')[1].innerHTML.match((?<=src=\\\").*?(?=\\\"))[0]

But the indication of the console makes me feel that something is wrong. enter image description here

Test it!

Upvotes: 0

Views: 114

Answers (3)

Peter Seliger
Peter Seliger

Reputation: 13356

... /src=\\"(?<url>https?:\/\/[^"]+)"/ ... and always bear in mind how backslashes "behave" when having to be written within a string for input reasons and how a system does handle them as part of output values ...

const sample = "&lt;video&gt;&lt;source src=\\\"https://drivebutler.drk1.workers.dev/0:/Cartoon%20Collection/Naruto%20Shippuden%20(Complete%20Series%20001-500)%20Naruto%20Shippuuden%20[1080p]%20[HEVC]%20[x265]%20[Batch]%20[pseudo]/Season%2015%20(Episodes%20321-348)/[AnimeRG]%20Naruto%20Shippuden%20-%20338%20[1080p]%20[x265]%20[pseudo].mkv\" type=\"video/mp4\"&gt;&lt;/video&gt;"

const regXExtractUrl = (/src=\\"(?<url>https?:\/\/[^"]+)"/);

console.log(
  regXExtractUrl.exec(sample)?.groups.url
);
console.log(
  regXExtractUrl.exec("")?.groups.url
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

different escaping ... different regex ...

const sample_A = "&lt;video&gt;&lt;source src=\"https://drivebutler.drk1.workers.dev/0:/Cartoon%20Collection/Naruto%20Shippuden%20(Complete%20Series%20001-500)%20Naruto%20Shippuuden%20[1080p]%20[HEVC]%20[x265]%20[Batch]%20[pseudo]/Season%2015%20(Episodes%20321-348)/[AnimeRG]%20Naruto%20Shippuden%20-%20338%20[1080p]%20[x265]%20[pseudo].mkv\" type=\"video/mp4\"&gt;&lt;/video&gt;"

const sample_B = `&lt;video&gt;&lt;source src="https://drivebutler.drk1.workers.dev/0:/Cartoon%20Collection/Naruto%20Shippuden%20(Complete%20Series%20001-500)%20Naruto%20Shippuuden%20[1080p]%20[HEVC]%20[x265]%20[Batch]%20[pseudo]/Season%2015%20(Episodes%20321-348)/[AnimeRG]%20Naruto%20Shippuden%20-%20338%20[1080p]%20[x265]%20[pseudo].mkv" type="video/mp4"&gt;&lt;/video&gt;`

const regXExtractUrl = (/src="(?<url>https?:\/\/[^"]+)"/);

console.log(
  regXExtractUrl.exec(sample_A)?.groups.url
);
console.log(
  regXExtractUrl.exec(sample_B)?.groups.url
);
console.log(
  regXExtractUrl.exec("")?.groups.url
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

Upvotes: 1

wjatek
wjatek

Reputation: 1006

You didn't enclose your regular expression between slashes like this:

.match(/(?<=src=\\\").*?(?=\\\")/)

Check how to create regular expressions using literal notation in JavaScript here.

If you want to escape a special character you should use single backslash, because now you are escaping one backslash and one quotation mark, so I think you want it to be like this:

.match(/(?<=src=\").*?(?=\")/)

But you do not need to escape characters like quotation marks anyway.

Upvotes: 1

I fix it by the following

document.getElementsByClassName('mdui-textfield-input')[1].innerHTML.match(/(?<=src=\").*?(?=\")/g)[0]

Upvotes: 0

Related Questions