Sosian
Sosian

Reputation: 622

Capture text inside start/end character but ignore doubled end character

I am trying to get the text inside my start/end characters ("<" ">") with Regex, while ignoring a doubled end character inside the text (so "<<" should be included in the captured data).

I tried

<([^>]*)>

and

<(.*?)>(?!>)

But am currently failing in following case:

Input:

<test>>Value>

Expected Output:

test>>Value

But my Regex capture only part of the strings.

The first one captures

test

and the second

test> 

Sadly i am out of ideas on how to further approach the problem. Does one of you Regex gods have any ideas how to solve this?

Edit:

Thanks for the Answers, sadly they do not match another requirement i have (which i dropped to keep the question as short as possible thinking it wouldnt matter... Lesson learned)

Input:

<test>>Value><test>

Expected Output:

test>>Value
test

Upvotes: 0

Views: 71

Answers (4)

Michał M
Michał M

Reputation: 618

(\w+)>{1,2}(\w+)

Or try this, without < at the beginning and > at the end.

Upvotes: -1

Richard
Richard

Reputation: 109035

Using a zero-width negative lookahead assertion to match a > not followed by another > to terminate the match seems the simplest way:

<(.*)>(?!>)

captures test>>more when matched against <test>>more>.

Note, your second regex (<(.*?)>(?!>)) is using the minimal matching modifier, so will stop at the first > not followed by another >.

EDIT:

With the additional information, so <test>>more><another> should capture test>>more and another:

 <([^>]*(?:>>[^>]*)*)>

using Regex.Matches will make the above captures.

Expanded

 <       # Match <
 (       # Start capture
  [^>]*  #  Match many non->
  (?:    #  Start non-capturing group
   >>    #   Match >>
   [^>]* #   Match many non->
  )*     #   Repeat zero or more
 )       #  End capture
 >       # Match >

Ie. it breaks up the content of the angle brackets into >> and non-> blocks and matches an indefinite number of them. It will handle <>>> (captures >>).

Upvotes: 2

SamWhan
SamWhan

Reputation: 8332

Here's my go at it :)

<((?:>>|[^>])*)>

It starts by matching the opening < and then tries to match >>, and if not matched, any character other than >, repeated until the ending > is found,

It also works with the added requirements ;)

Check it out here at regex101.

Upvotes: 2

Shekhar Khairnar
Shekhar Khairnar

Reputation: 2691

You can use : and get group 1st

(?:\<)(.*)(?:\>)

Demo and Explaination

Upvotes: 1

Related Questions