Chris
Chris

Reputation: 58242

How do I get lazy matching to match correctly?

Given a string (as seen in the examples below), I would like to extract the following into three groups:

  1. Group 1: Is the first character a # or not
  2. Group 2: Capture the string between the # (if it exists) and the square brackets (if the [)
  3. Group 3: Capture the contents of the square brackets (without the square brackets)

At this stage I have the following regular expression:

/^(#)?(.*?)\[?(.*?)\]?$/

I am using http://gskinner.com/RegExr/ as my testing tool with multiline and global turned on.

Example 1:

#Sprite[abc]

Expected Result

  1. Group 1: #
  2. Group 2: Sprite
  3. Group 3: abc

Actual Result

  1. Group 1: #
  2. Group 2: // Empty, not NO MATCH
  3. Group 3: Sprite[abc // No trailing ]

Example 2:

#Sprite

Expected Result

  1. Group 1: #
  2. Group 2: Sprite
  3. Group 3: [NO MATCH]

Actual Result

  1. Group 1: #
  2. Group 2:
  3. Group 3: Sprite

Example 3:

Sprite

Expected Result

  1. Group 1: [NO MATCH]
  2. Group 2: Sprite
  3. Group 3: [NO MATCH]

Actual Result

  1. Group 1: [NO MATCH]
  2. Group 2: // empty
  3. Group 3: Sprite

Example 4:

Sprite[abc]

Expected Result

  1. Group 1: [NO MATCH]
  2. Group 2: Sprite
  3. Group 3: abc

Actual Result

  1. Group 1: [NO MATCH]
  2. Group 2: // empty
  3. Group 3: Sprite[abc

To me it feels like the lazy match in the expression above isn't well being lazy, shouldn't it hit the [ and break out, group, and move on?

Upvotes: 1

Views: 100

Answers (3)

hsz
hsz

Reputation: 152266

You can try with:

^(#)?([^\[]*)(?:\[(.*?)\])?$

Upvotes: 0

jcollado
jcollado

Reputation: 40414

I've successfully use the following expression in python:

regex = re.compile(r'^(#)?(.*?)(?:\[(.*?)\])?$')

The problem was basically the question marks after the brackets (? just after .*? makes laziness difficult). The question mark now is for the whole expression, that is, (?:\[(.*?)\])?.

Note: The (?:) is used to avoid capturing the expression (I don't know if that's supported in the tool you're using).

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336368

Better be more specific instead of lazy :)

(#)?([^\[]*)(?:\[([^\]]*)\])?$

works on your examples. Translation:

(\#)?       # Match # (optional)
([^\[]*)    # Match any characters except [
(?:         # Try to match...
 \[         #  [, followed by
 ([^\]]*)   #  any characters except ], followed by
 \]         #  ]
)?          # optionally
$           # Match end of string.

Upvotes: 2

Related Questions