Reputation: 149
I`m trying to extract the src URL/path without the quotes, only in the case it is an image:
- src="/path/image.png" // should capture => /path/image.png
- src="/path/image.bmp" // should capture => /path/image.bmp
- src="/path/image.jpg" // should capture => /path/image.jpg
- src="https://www.site1.com" // should NOT capture
So far I have /src="(.*)"/g
, but that obviously captures both, I have been looking at look behind and look ahead but just can`t put it together.
Upvotes: 1
Views: 157
Reputation: 1808
Try /src="(.*(?:jpg|bmp|png))"/g
You'll need to enter in the list of extensions you consider valid images
Upvotes: 2
Reputation: 163342
You can use a capture group, and you should prevent crossing the "
using a negated character class.
If you want to match either href or src
\b(?:href|src)="([^\s"]*\.(?:png|jpg|bmp))"
Explanation
\b
A word boundary to prevent a partial word match(?:href|src)="
match either href=
or src=
(
Capture group 1
[^\s"]*
Match optional chars other than a whitespace char or "
\.(?:png|jpg|bmp)
Match one of .png
.jpg
.bmp
)
Close group 1"
Match literallyconst regex = /\b(?:href|src)="([^\s"]*\.(?:png|jpg|bmp))"/;
[
'src="/path/image.png" test "',
'src="/path/image.bmp"',
'src="/path/image.jpg"',
'src="https://www.site1.com"',
'href="image.png"'
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
})
Upvotes: 4
Reputation: 3210
If you want it to be a bit more fool proof you can use look behinds and look aheads. Expand the extension list png|bmp|jpg
to test for more extensions.
/(?<=src=").*(png|bmp|jpg)(?=")/g
Upvotes: 1