Reputation: 6382
Suppose I have a string like this:
<code>Blah blah Blah
enter code here</code>
<code class="lol">enter code here
fghfgh</code>
I want to use javascript to replace all occurences between the <code>
tags with a callback function that html encodes it.
This is what I have currently:
function code_parsing(data){
//Dont escape & because we need that... in case we deliberately write them in
var escape_html = function(data, p1, p2, p3, p4) {
return p1.replace(/</g, "<").replace(/>/g, ">").replace(/"/g, """).replace(/'/g, "'");
};
data = data.replace(/<code[^>]*>([\s\S]*?)<\/code>/gm, escape_html);
// \[start\](.*?)\[end\]
return data;
};
This function is unfortunately removing "<code>"
tags and replacing them with just the content. I would like to keep the <code>
tags with any number of attributes. If I just hardcode the <code>
tag back into it, I will lose the attributes.
I know regex isn't the best tool, but there won't be any nested elements in it.
Upvotes: 2
Views: 1627
Reputation: 1959
Simple solution: In your escape_html
function, after the operation is done on the string, but BEFORE your return it, append and prepend your tags to the string and return the full thing.
Sometimes the simplest answer is the best :)
Upvotes: 1
Reputation: 120586
You shouldn't use regular expressions to parse HTML.
That said, you need to capture the content you want to preserve using a parenthetical group and have your replacer append that to the bit you manipulate.
data.replace(/(<code[^>]*>)([\s\S]*?)(<\/code>)/g,
function (_, startTag, body, endTag) {
return startTag + escapeHtml(body) + endTag;
})
To understand why you shouldn't use regular expressions to parse HTML, consider what this does to
<code title="Shows how to tell whether x > y">if (x > y) { ... }</code>
<code lang="js">node.style.color = "<code lang="css">#ff0000</code>"</code>
<code>foo</CODE >
<textarea><code>My HTML code goes here</code></textarea>
<code>foo <!-- commented out </code> --></code>
Upvotes: 3