JCoulam
JCoulam

Reputation: 191

Using regular expression to replace special characters outside of html tags

I'm trying to find and replace some special html entities, i.e. '&' converts to & and '>' converts to >. This is for an email builder tool, and some older clients need characters replacing with html entities.

The user passes through a string, and I use javascript to loop through an array of objects. This finds a character and replaces it with the correct html entity.

You can see the regex code I'm using here:

https://regex101.com/r/WZh5tA/2

    escapeCharacter: function(string){
      var replaceChar = [
        {reg : '&', replace: '&'},
        {reg : '"', replace: '"'},
        {reg : '£', replace: '£'},
        {reg : '€', replace: '€'},
        {reg : 'é', replace: 'é'},
        {reg : '–', replace: '–'},
        {reg : '®', replace: '®'},
        {reg : '™', replace: '™'},
        {reg : '‘', replace: '‘'},
        {reg : '’', replace: '’'},
        {reg : '“', replace: '“'},
        {reg : '”', replace: '”'},
        {reg : '#', replace: '#'},
        {reg : '©', replace: '©'},
        {reg : '@', replace: '@'},
        {reg : '$', replace: '$'},
        {reg : '\\(', replace: '('},
        {reg : '\\)', replace: ')'},
        {reg : '<', replace: '&lt;'},
        {reg : '>', replace: '&gt;'},
        {reg : '…', replace: '&hellip;'},
        {reg : '-', replace: '&#45;'},
        {reg : "'", replace: '&#39;'},
        {reg : '\\*', replace: '&#42;'},
        {reg : ',', replace: '&sbquo;'}
    ];
    var s = string;
    replaceChar.forEach(function(obj){
      var regEx = new RegExp(obj.reg+"(?!([^<]+)?>)", "g");
      s = s.replace(regEx, obj.replace);
    });

    return s
  }

The problem occurs when the user passes a string with html tags (which they should be allowed to do). For example, the string could be:

'This is an example of some <b>bold</b> text'

My find and replace tool works it's magic, but I think I'm missing something because I get this output:

'This is an example of some <b>bold</b&gt; text'

Upvotes: 0

Views: 2086

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627292

You may use

s = s.replace(
      new RegExp("(<[^<>]*>)|" + obj.reg.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), "g"), 
          function ($0, $1) { return $1 ? $0 : obj.replace } 
);

Notes:

  • You need to escape the obj.reg before using in a regex expression, hence .replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') is required
  • The (<[^<>]*>)| alternative matches and captures into Group 1 <...> substrings before the required matches and in the callback method passed as the replacement argument, there is a check if the first group matched. If it did, the whole match is returned back as is, else, the replacement occurs.

Upvotes: 2

Related Questions