James
James

Reputation: 1956

Regex appears to ignore multiple piped characters

Apologies for the awkward question title, I have the following JavaScript:

var wordRe = new RegExp('\\b(?:(?![<^>"])fox|hello(?![<\/">]))\\b', 'g'); // Words regex

console.log('<span>hello</span> <hello>fox</hello> <a href="hello">fox link</a> hello my name is fox'.replace(wordRe, 'foo'));

What I'm trying to do is replace any word that isn't nested in a HTML tag, or part of a HTML tag itself. I.e I want to only match "plain" text. The expression seems to be ignoring the rule for the first piped match "fox", and replacing it when it shouldn't be.

Can anyone point out why this is? I think I might have organised the expression incorrectly (at least the negative lookahead).

Here is the JSFiddle.

I'd also like to add that I am aware of the implications of using regex with HTML :)

Upvotes: 2

Views: 92

Answers (1)

Stephan
Stephan

Reputation: 43033

For your regex work, you want lookbehind. However, as of this writing, this feature is not supported in Javascript.

Here is a workaround:

Instead of matching what we want, we will match what we don't want and remove it from our input string. Later, we can perform the replace on the cleaned input string.

var nonWordRe = new RegExp('<([^>]+).*?>[^<]+?</\\1>', 'g');
var test = '<span>hello</span> <hello>fox</hello> <a href="hello">fox link</a> hello my name is fox';

var cleanedTest = test.replace(nonWordRe, '');

var final = cleanedTest.replace(/fox|hello/, 'foo'); // once trimmed final=='foo my name is foo'


NOTA:

I have build this workaround based on your sample. But here are some points that may need to be explored if you face them:

  • you may need to remove self closing tags (<([^>]+).*?/\>) from the test string
  • you may need to trim the final string (final)
  • you may need a descent html parser if tags can contain other tags as HTML allow this. Javascript doesn't, again as of this writing, recursive patterns.

Demo

http://jsfiddle.net/yXd82/2/

Upvotes: 1

Related Questions