Reputation: 833
I'm having trouble naming this question, and it feels like it's something I should have found myself, but I'm too dumb it seems. RegEx is still incredibly complicated to me, so please don't be too harsh to me.
Basically, I have a huge list of text of which I need to extract certain word sections. I know the mask around the word, but I obviously only need the word itself. Let me try to give you a simple example:
<b>Name1</b>
<i>Name2</i>
<u>Name3</u>
I can clearly see the things I want are all surrounded by <> tags. My approach was always to find the entire string and then simply do a plain replace to get rid of these extra characters.
<\w>{1}\w+<\/\w>{1}
string.replace("<b>","");
string.replace("</b>","");
... and so on.
However, something just feels wrong about it. Like, incredibly wrong. Can't I just directly say in my RegEx search what exactly I'm looking for? Like:
<\w>{1}START\w+END<\/\w>{1}
Does something like this exist?
(This is a general question, not a specific problem, so please don't provide alternate workarounds or something. I've had this problem many, many times already, and I'm fed up with solving it with this hackish way.)
Upvotes: 0
Views: 89
Reputation: 1909
How about <[^>]+>([^<]+)<\/[^>]+>
? It'll match the whole "tag", but it'll only capture what's between the tags...
Upvotes: 1
Reputation: 1096
A regex like (?!<\w>)\w+(?=<\/\w>)
might be what you are looking for. See example here regextester
Upvotes: 1