Selbi
Selbi

Reputation: 833

Select start and end of a RegEx

I'm having trouble naming this question, and it feels like it's something I should have found myself, but I'm too dumb it seems. RegEx is still incredibly complicated to me, so please don't be too harsh to me.

Basically, I have a huge list of text of which I need to extract certain word sections. I know the mask around the word, but I obviously only need the word itself. Let me try to give you a simple example:

<b>Name1</b>
<i>Name2</i>
<u>Name3</u>

I can clearly see the things I want are all surrounded by <> tags. My approach was always to find the entire string and then simply do a plain replace to get rid of these extra characters.

<\w>{1}\w+<\/\w>{1}
string.replace("<b>","");
string.replace("</b>","");
... and so on.

However, something just feels wrong about it. Like, incredibly wrong. Can't I just directly say in my RegEx search what exactly I'm looking for? Like:

<\w>{1}START\w+END<\/\w>{1}

Does something like this exist?

(This is a general question, not a specific problem, so please don't provide alternate workarounds or something. I've had this problem many, many times already, and I'm fed up with solving it with this hackish way.)

Upvotes: 0

Views: 89

Answers (2)

Hunter Eidson
Hunter Eidson

Reputation: 1909

How about <[^>]+>([^<]+)<\/[^>]+>? It'll match the whole "tag", but it'll only capture what's between the tags...

Upvotes: 1

zchrykng
zchrykng

Reputation: 1096

A regex like (?!<\w>)\w+(?=<\/\w>) might be what you are looking for. See example here regextester

Upvotes: 1

Related Questions