How to replace content between html tags without replacing the tags themselves

Question

Suppose I have a string like this:

Blah blah Blah
enter code here
enter code here
fghfgh

I want to use javascript to replace all occurences between the tags with a callback function that html encodes it.



This is what I have currently:

function code_parsing(data){
    //Dont escape & because we need that... in case we deliberately write them in
    var escape_html = function(data, p1, p2, p3, p4) {
        return p1.replace(//g, ">").replace(/"/g, """).replace(/'/g, "'");
    };

    data = data.replace(/]*>([\s\S]*?)/gm, escape_html);
        // $$start$$(.*?)$$end$$
        return data;        
    };

This function is unfortunately removing "" tags and replacing them with just the content. I would like to keep the tags with any number of attributes. If I just hardcode the tag back into it, I will lose the attributes.



I know regex isn't the best tool, but there won't be any nested elements in it.

Mike Samuel · Accepted Answer

You shouldn't use regular expressions to parse HTML.

That said, you need to capture the content you want to preserve using a parenthetical group and have your replacer append that to the bit you manipulate.

data.replace(/(]*>)([\s\S]*?)(<\/code>)/g,
             function (_, startTag, body, endTag) {
               return startTag + escapeHtml(body) + endTag;
             })

To understand why you shouldn't use regular expressions to parse HTML, consider what this does to

if (x > y) { ... }

node.style.color = "#ff0000"

foo

<code>My HTML code goes here</code>

foo

How to replace content between html tags without replacing the tags themselves

Answers (2)

Related Questions