Reputation: 148
I want to capture html comments with the exception of a specific comment i.e,
<!-- end-readmore-item -->
At the moment, I can successfully capture all of the HTML comments using the regex below,
(?=<!--)([\s\S]*?)-->
To ignore the specified comment, I have tried the lookahead and lookbehind assertions but being new at the advanced level of Regex I am probably missing out on something.
So far, I have been able to devise the following regex using lookarounds,
^((?!<!-- end-readmore-item -->).)*$
I expect it to ignore the end-readmore-item
comment and only capture other comments such as,
<!-- Testing-->
However, it does the job but also captures the regular HTML tags which I want to be ignored as well.
I have been using the following html code as a test case,
<div class="collapsible-item-body" data-defaulttext="Further text">Further
text</div>
<!-- end-readmore-item --></div>
</div>
<!-- -->
it only should match with <!-- --> but it's selecting everything except <!--
end-readmore-item -->
the usage of this is gonna be to remove all the HTML comments except <!--
end-readmore-item -->
Upvotes: 3
Views: 81
Reputation: 19641
You can use the following pattern:
<!--(?!\s*?end-readmore-item\s*-->)[\s\S]*?-->
Breakdown:
<!-- # Matches `<!--` literally.
(?! # Start of a negative Lookahead (not followed by).
\s* # Matches zero or more whitespace characters.
end-readmore-item # Matches literal string.
\s* # Matches zero or more whitespace characters.
--> # Matches `-->` literally.
) # End of the negative Lookahead.
[\s\S]*? # Matches any character zero or more time (lazy match),
# including whitespace and non-whitespace characters.
--> # Matches `-->` literally.
Which basically means:
Match
<!--
that is not followed by [a whitespace* +end-readmore-item
+ another whitespace* +-->
] and which is followed by any amount of characters then immediately followed by-->
.
* An optional whitespace repeated zero or more times.
Upvotes: 2
Reputation: 159
You are very close with your negative lookahead assertion, you just need to modify it as follows:
<!--((?!end-readmore-item).)*?-->
Where *?
matched non-greedily.
This will match all comments except those that contain the string end-readmore-item
inside the comment body.
Upvotes: 1