Reputation: 6943
I need to redact health information from emails that are loaded into a string variable by replacing characters with █. The emails in question need content in between the words "health issues?" and "Have you worked" replaced but ignoring anything that appears in tags. Additionally lines often are wrapped with with = signs, and those new line, spaces, and = signs can occur right in the middle of a tag, and they can also occur in the middle of the strings used to identify the start and end.
Example:
(More content)
.....have any health issues? We currently do not have any health issues</sp=
an></li>
<li id=3D"m_-622133557606915713yui_3_16_0_ym19_1_1515713539439_17326" styl=
e=3D"margin-top:0;margin-bottom:0;vertical-align:middle;line-height:15pt;co=
lor:black"><span id=3D"m_-622133557606915713yui_3_16_0_ym19_1_1515713539439=
_17327" style=3D"font-family:Arial;font-size:11.0pt">Some more text.
Have
you worked.....(more content)
I am figuring there is a way to do this in javascript using one or more regular expressions, but I am at a loss to see how.
The desired result would look like:
(More content)
.....have any health issues?███████████████████████████████████████████</sp=
an></li>
<li id=3D"m_-622133557606915713yui_3_16_0_ym19_1_1515713539439_17326" styl=
e=3D"margin-top:0;margin-bottom:0;vertical-align:middle;line-height:15pt;co=
lor:black"><span id=3D"m_-622133557606915713yui_3_16_0_ym19_1_1515713539439=
_17327" style=3D"font-family:Arial;font-size:11.0pt">███████████████
Have
you worked.....(more content)
Upvotes: 0
Views: 227
Reputation: 48741
You could use two replace
methods to solve this problem. The first one matches every thing from health issues?
to Have you worked
captured into three capturing groups. We are interested in second capturing group:
(health issues\?)([\s\S]*?)(Have\s+you\s+worked)
^^^^^^^^
We run our second replace
method on this captured group and substitutes each character outside of tags with a █
. This is the regex:
(<\/?\w[^<>]*>)|[\s\S]
We need to keep first capturing group (they are probably HTML tags) and replace the other side of alternation ([\s\S]
) with the mentioned character.
Disclaimer: this is not bulletproof as regex shouldn't be used to parse HTML tags.
Demo:
var str = `(More content)
.....have any health issues? We currently do not have any health issues</sp=
an></li>
<li id=3D"m_-622133557606915713yui_3_16_0_ym19_1_1515713539439_17326" styl=
e=3D"margin-top:0;margin-bottom:0;vertical-align:middle;line-height:15pt;co=
lor:black"><span id=3D"m_-622133557606915713yui_3_16_0_ym19_1_1515713539439=
_17327" style=3D"font-family:Arial;font-size:11.0pt">Some more text.
Have
you worked.....(more content)`;
console.log(str.replace(/(health issues\?)([\s\S]*?)(Have\s+you\s+worked)/, function(match, $1, $2, $3) {
return $1 + $2.replace(/(<\/?\w[^<>]*>)|[\s\S]/g, function(match, $1) {
return $1 ? $1 : '█';
}) + $3;
}));
Upvotes: 1