rodders
rodders

Reputation: 354

Making group optional at end of regex causes it to be never matched

I have a regex in PHP to match some text like this:

24th Meeting - The quick brown fox [10 January 2012 to 26 September 2012]

The pattern I've come up with looks like this:

$pattern = "/(([0-9]{1,2})(st|nd|rd|th)\sMeeting\s-\s)?(.*)(\[([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\sto\s([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\])$/"

This seems to work fine.

However, I would like the date portion at the end to be optional. BUT, when I add a ? after the dates grouping, preg_match no longer pulls out the dates if they are in the string. I suspect that the .* is taking over, but I can't seem to get it

Upvotes: 0

Views: 78

Answers (3)

bukart
bukart

Reputation: 4906

This little changes will do it (bold)

/(([0-9]{1,2})(st|nd|rd|th)\sMeeting\s-\s)? (.*?) (\[([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\sto\s([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\]|$)$/

first the freetext expression is extended by a ? to make it ungreedy (see the other posts)

than |$ is appended to the date part to tell it to be exactly the date or the end of the string.

Here's your total regex

/(([0-9]{1,2})(st|nd|rd|th)\sMeeting\s-\s)?(.*?)(\[([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\sto\s([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\]|$)$/

Upvotes: 0

Firas Dib
Firas Dib

Reputation: 2621

Just as you presumed, the .* (greedy quantifier) eats up too much information. This can be solved either by making it lazy or replacing it with something else such as [^[]*. However, replacing it with the latter suggestion will disallow any use of the literal [ in the string.

What you should also do, besides fixing this issue, is learn to use non-capturing groups for the parts you don't need saved. This will speed up your regexes and save some memory.

Here's my solution to your problem. Not much changed, but I'm sure you can spot the differences.

/(([0-9]{1,2})(st|nd|rd|th)\sMeeting\s-\s)?(.*)(\[([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\sto\s([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\])?$/

You can view a demo and an explanation to the regular expression here: http://regex101.com/r/vZ1nH6

The website uses PHP so it's accurate to your problem. If you are interested in learning more I suggest you read up on regular expressions over at www.regular-expressions.info and have a look at the quiz over at http://www.regex101.com/quiz/

Upvotes: 0

staafl
staafl

Reputation: 3225

(.*) --> (.*?)

Read more about lazy quantifiers here:

http://www.regular-expressions.info/repeat.html

Upvotes: 1

Related Questions