Reputation: 4291
I have a string that is formed from tag-substitution, which also results in parts of the string being marked for deletion, for example:
Keep1
{/*DELETE}
Delete1a
{/*DELETE}
Delete2
{DELETE*/}
Delete1b
{DELETE*/}
Keep2
{/*DELETE}
Delete3
{DELETE*/}
Keep3
Am I correct that a RegEx cannot be used to select only the inner DELETE2 and DELETE3, remove those, and then repeat to get the DELETE1a/b until no further matches are found?
The RegEx I am passing to my replace function is
\{\/\*DELETE\}([\s\S]*?)\{DELETE\*\/\}
This matches
{/*DELETE}
Delete1a
{/*DELETE}
Delete2
{DELETE*/}
If this is the only RegEx match that I can make I could [suppress the leading {/*DELETE}
and] call the replace function recursively which, I think, would enable me to remove the nested {TAGS}
Is a better way?
I am using the RegEx in VBScript
EDIT: In case it helps I can change the {/*DELETE}
and {DELETE*/}
tags, even to a single character
EDIT2: I could use a single-character as the Start/End delete marker - if, for example, that would be faster for a RegEx expression to resolve e.g. by being less complex
e.g. if the Start-Delete is [
and then end delete is ]
Keep1
[
Delete1a
[
Delete2
]
Delete1b
]
Keep2
[
Delete3
]
Keep3
These characters chosen for appearance in this example, in practice they would occur within my real-world data, but I expect I could chose two ASCII values which do not appear in my data at all.
Clarification: The {DELETE} tags will not always appear on a line by themselves, so this style of string formation will also exist
Keep1{/*DELETE}Delete1a
{/*DELETE}Delete2{DELETE*/}
Delete1b{DELETE*/}Keep2a
Keep2b{/*DELETE}Delete3{DELETE*/}Keep3
or with single-character delete-tags:
Keep1[Delete1a
[Delete2]
Delete1b]Keep2a
Keep2b[Delete3]Keep3
Upvotes: 2
Views: 216
Reputation: 626861
If your delimiters are multicharacter tags, you may use a tempered greedy token:
\{\/\*DELETE}((?:(?!\{\/\*DELETE})[\s\S])*?)\{DELETE\*\/}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
that will match any char, 0+ times, that is not a starting point for a {/*DELETE}
char sequence. Run this regex replace recursively, see Iteration 1 and Iteration 2 demos.
NOTE that if you have these delimiters inside comments or string literals, this won't work correctly.
To make it safe, you may define that the delimiting tags only appear as single entities on a line:
^\s*\{\/\*DELETE}(\s*(?:\r?\n(?!\s*\{(?:\/\*DELETE|DELETE\*\/)}).*)*)\r?\n\s*\{DELETE\*\/}\s*$
See Iteration 1 and Iteration 2 demos (here, you will need to enable regExp.Multiline = True
)
This is by far the easiest scenario - you may the starting delimiter char, then match any 0+ chars other than the starting and ending delimiter char using a negated character class - and then the ending delimiter char.
If the starting delimiter char is [
and the ending delimiter char is ]
, the regex is a well-known
\[[^\][]*\]
See the regex demo: Iteration 1 and Iteration 2.
Note that [
and ]
usually are part of data you need, so perhaps, you will want to use some more fancy paired stuff, like ⦅
(2985 LEFT WHITE PARENTHESIS) and ⦆
(2986 RIGHT WHITE PARENTHESIS):
\u2985[^\u2985\u2986]*\u2986
See another regex demo.
Upvotes: 2