Иво Недев
Иво Недев

Reputation: 1590

Single Regex to strip all HTML but the anchors

Versions of this have been asked several times on here, and using those I was able to get two different ReGex statements.

One that strips all HTML

1. <[^>]*>

And one that strips everything but the anchor tags

2. <a[^>]*>([^<]+)<\/a>

I have no hope of combining those to get a regex that strips all HTML but keeps the anchors so (1+!2). Therefore I'm currently going once trough my HTML with the first regex, and if I encounter a certain keyword that usually lives inside the anchors then I go trough the Body with the 2nd regex and combine both.

That clearly is not ideal and will most likely miss many anchors.

What would a single regex that matches all HTML but the anchors look like ? /1?!2/

Test data: https://www.regextester.com/?fam=105725 I need everything that is ALL CAPS and the anchor around it.

Upvotes: 1

Views: 57

Answers (1)

SamWhan
SamWhan

Reputation: 8332

Disregarding my own comment ;) - Is this what you're after?

Replace

<((?!a|\/a)[^>]*)>\s*

with empty string.

The negative look-ahead after the opening < makes sure it ignores anchors.

Here at regex101.

Upvotes: 3

Related Questions