Reputation: 4908
I'm getting content from a textarea (which besides simple text may include html markup), and try to parse it and replace all occurences of "[File:#xx#]" with a string contained in an array.
So, lets say the contents of the textarea is in var html
.
I do the following:
html = html.replace(/\[File:#(.*)#\]/g,
function($0, $1){ return furls[$1]; });
everything works fine when the contents the text area are like this:
<img src="[File:#111#]" alt="image1" />
<img src="[File:#222#]" />
but when there is no line break between 2 elements witch have attribute with [File:#xx#]
value, then the problem appears.
So, given this as the textarea's value:
<img src="[File:#111#]" alt="image1" /><img src="[File:#222#]" />
seems like it matces the first img's [File:#111#
but closes it not with the first bracket, rathen than the second one's. so, what gets replaced is all this:
#]" alt="image1" /><img src="[File:#222
What is wrong with my regular expression? How can i prevent this look-ahead from happening and stop at the first closing bracket?
Thanks in advance.
Upvotes: 1
Views: 278
Reputation: 478
The problem is that it's grabbing everything from the first # sign to the last one, because you're using (.*), which matches all characters. Try this instead, which limits the matched part to just numeric digits:
html = html.replace(/\[File:#([0-9]*)#\]/g,
function($0, $1){ return furls[$1]; });
Upvotes: 1
Reputation: 5647
Well the correct regex for your case would be:
/[File:#[\w]+#]/g
Why is this the case?
Because in your regex:
The . Matches any character, except for line breaks if dotall is false.
The * Matches 0 or more of the preceeding token. This is a greedy match, and will match as many characters as possible before satisfying the next token.
And in the regex i've provided:
The \w Matches any word character (alphanumeric & underscore).
Upvotes: 1