Reputation: 1590
Versions of this have been asked several times on here, and using those I was able to get two different ReGex statements.
One that strips all HTML
1. <[^>]*>
And one that strips everything but the anchor tags
2. <a[^>]*>([^<]+)<\/a>
I have no hope of combining those to get a regex that strips all HTML but keeps the anchors so (1+!2). Therefore I'm currently going once trough my HTML with the first regex, and if I encounter a certain keyword that usually lives inside the anchors then I go trough the Body with the 2nd regex and combine both.
That clearly is not ideal and will most likely miss many anchors.
What would a single regex that matches all HTML but the anchors look like ? /1?!2/
Test data: https://www.regextester.com/?fam=105725 I need everything that is ALL CAPS and the anchor around it.
Upvotes: 1
Views: 57
Reputation: 8332
Disregarding my own comment ;) - Is this what you're after?
Replace
<((?!a|\/a)[^>]*)>\s*
with empty string.
The negative look-ahead after the opening <
makes sure it ignores anchors.
Upvotes: 3