ahhmarr
ahhmarr

Reputation: 2320

extract the first occurrence only

Data:

#r; 
 text
#r;

#r; 
  text2
#r;

Regex:

/#r;[\w\W]*#r;/

I just want to extract the first occurrence only (i.e. #r;text#r;). However, the following pattern is extracting both the matches.

What should I do in order to get only the first occurrence?

Upvotes: 0

Views: 382

Answers (3)

Mike Samuel
Mike Samuel

Reputation: 120496

Your problem is that the * matches everything and does not stop at the close boundary so it ends up consuming " text\nr#;\n\nr#;\n text2\n" instead of just " text\n". The solution is to make the * lazy:

/#r;[\w\W]*?#r;/

The non-greedy qualifier (the ? after the *) causes the * to match just enough for the regular expression as a whole to work.

http://www.regular-expressions.info/possessive.html has more info:

A greedy quantifier will first try to repeat the token as many times as possible, and gradually give up matches as the engine backtracks to find an overall match. A lazy quantifier will first repeat the token as few times as required, and gradually expand the match as the engine backtracks through the regex to find an overall match.

Upvotes: 0

jfriend00
jfriend00

Reputation: 707238

See Option 4 below as the best recommended option.

Option 1: Without using lookaheads and using a non-greedy wildcard match, you can use this regex:

/#r;.*?#r;/

This matches:

a pattern that starts with "#r;"
followed by any number of characters, but the fewest possible
followed by "#r;"

Option 2: Or if you want to get just the text between the delimiters, you can use this and then reference the [1] item returned from the search:

/#r;(.*?)#r;/

"#r;text1#r;#r;text2#r;".match(/#r;(.*?)#r;/)[1] == "text1"

You can see it in action here: http://jsfiddle.net/jfriend00/ZYdP8/

Option3: Or, if there are actually newlines before and after each #r; in the thing you're trying to match, then you would use this regex:

/#r;\n(.*?)\n#r;/

which you can see working here: http://jsfiddle.net/jfriend00/ZYdP8/10/.

Option4: Or, (taking Tom's suggestion) if you don't want any whitespace of any kind to be part of the match on the boundaries, you can use this:

/#r;\s*(.*?)\s*#r;/

which you can see working here: http://jsfiddle.net/jfriend00/ZYdP8/12/.

Upvotes: 3

Eric
Eric

Reputation: 8078

try this out.

/#r;[\w\W](?=#r;)/

Upvotes: 0

Related Questions