Regexp to read to a plus sign

Question

I am using the below regexp successfully to read between my tags until I reach a case where there is a < sign embedded in my data between the tags. To fix this I want to read between a +> and a . There is no way that combination would be used in the database I'm pulling from. When I try to change the code below to do this I get stuck. Have any ideas?



Code:

@fieldValues =  $inFileLine =~ m(>([^<]+)<)g;


My sorry attempt at modifying the code:

@fieldValues =  $inFileLine =~ m(\+>([^<\/\+]+)<\/\+)g;


Data:

<+RecordID+>SWCR000111<+Title+>My Title Is < Than Yours

Wiktor Stribiżew · Accepted Answer

Since it works for you as the +> cannot be followed with <+, I am posting my comment as an answer.

This regex should be safe to use even with very large files:

\+>(?!<\+)([^<]*(?:<(?!\/\+)[^<]*)*)<\/\+

See regex demo

Here is what it is doing:

\+>(?!<\+) - matches +> (with \+>) that is not followed with <+ (due to the negative lookahead (?!<\+))
([^<]*(?:<(?!\/\+)[^<]*)*) - matches and stores in Group 1
- [^<]* - 0 or more characters other than < followed by...
- (?:<(?!\/\+)[^<]*)* - 0 or more sequences of...
  - <(?!\/\+) - < that is not followed by /+ and then
  - [^<]* - 0 or more characters other than <
<\/\+ - matches the final



In short, this is the same as \+>(?!<\+)([\s\S]*?)<\/\+, but "unwrapped" using the unrolling-the-loop technique to allow large portions of text in-between the delimiters (that is, between +> and the closest ).

Regexp to read to a plus sign

Answers (2)

Related Questions