Reputation: 3
I need a regular expression to find Text between HTML-elements via the Visual Studion Search Engine (might by C#).
What works fine in a way is this:
>\s*([\w])+\s*<
But it has to match all the following "asdf"s:
<element>asdf
<element>asdf.</element>asdf
<element />
asdf asdf
</element>
<element>
asdf!
</element>
What it should NOT find is an empty space between 2 tags, this example should match NOTHING:
<element>
<element> </element>
</element>
What I need in particular is a regex, that matches:
I don't want to get matches which includes special characters without \w.
Another, which doesn't work at all is this:
>\s*((?=[\w]+)(?=[ ?=()!"_]*))\s*<
What is the correct way to accomplish my need?
Thank you so much!
Upvotes: 0
Views: 1182
Reputation: 370879
You can use one lookahead before matching the text between the ><
s:
>(?=[^<]*\w).*?<
(use "s" flag, so dot matches newline - or, use something like [\S\s]*?
instead of .*?
)
The lookahead ensures that there's a word character between the >
and the <
. Then, match and lazy-repeat any character until you get to the <
.
https://regex101.com/r/cqinyh/2
Upvotes: 1