Grofit
Grofit

Reputation: 18435

Only match regex if it doesnt start with a pattern in javascript

I have a bit of a strange one here, I basically have a large chunk of text which may or may not contain links to images.

So lets say it does I have a pattern which will extract the image url fine, however once a match is found it is replaced with a element with the link as the src. Now the problem is there may be multiple matches within the text and this is where it gets tricky. As the url pattern will now match the src tags url, which will basically just enter an infinite loop.

So is there a way to ONLY match in regex if it doesnt start with a pattern like ="|=' ? as then it would match the url in something like:

some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6

but not

some image <img src="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6">

I am not sure if it is possible, but if it is could someone point me in the right direction? A replace by itself will not suffice in this scenario as the url matched needs to be used elsewhere too so it needs to be used like a capture.

The main scenarios I need to account for are:

== edit ==

Here is the current regex I am using to match urls:

(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))

== edit 2 ==

Just so everyone understands why I cannot use the /g command here is an answer which explains the issue, if I could use this /g like I originally tried then it would make things a lot simpler.

Javascript regex multiple captures again

Upvotes: 1

Views: 242

Answers (4)

GuiDocs
GuiDocs

Reputation: 732

as was said by freefaller, you might use /g flag to just find all matches in one go, if exec is not a must.

otherwise: you can add (="|=')? to the beginning of your regex, and check if $1 is undefined. if it is undefined, then it was not started with a ="|=' pattern

Upvotes: 0

freefaller
freefaller

Reputation: 19953

Using the /ig command at the end should work... the g is for global replace and the i is for case-insensitivity, which is necessary as you've only got A-Z instead of a-zA-Z.

Using the following vanilla JS appears to work for me (see jsfiddle)...

var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");

Although, what it does highlight is that the query string part of the URL (the ?v=6 is not being picked up with your RegEx).

For jQuery, it would be (see jsfiddle)...

$(document).ready(function(){
  var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
  var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
  $("#output").html(test.replace(re,"<img src=\"$1\"/>"));
});

Update

Just in case my example of using the same image URL in the example doesn't convince you - it also works with different URLs... see this jsfiddle update

var test="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 http://cdn.sstatic.net/serverfault/img/sprites.png?v=7";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");

Upvotes: 1

Tomke
Tomke

Reputation: 38

Couldn't you just see if there is a whitespace in front of the url, instead of that word-boundary? seems to work, although you will have to remove the matched whitespace later.

(\s(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))

http://rubular.com/r/9wSc0HNWas

Edit: Damn, too slow :) I'll still leave this here as my regex is shorter ;)

Upvotes: 0

Ibrahim Najjar
Ibrahim Najjar

Reputation: 19423

What you are looking for is a negative look behind, but Javascript doesn't support any kind of look behinds, so you will either have to use a callback function to check what was matched and make sure it is not preceded by a ' or ", or you can use the following regex:

(?:^|[^"'])(\b(https?|ftp|file):\/\/[-a-zA-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))

which has a single problem, that is in the case of a successful match it will catch one more character, the one right before the (\b(https?|ftp|file) pattern in the input, but I think you can deal with this easily.

Regex101 Demo

Upvotes: 3

Related Questions