Reputation: 354
I have a regex in PHP to match some text like this:
24th Meeting - The quick brown fox [10 January 2012 to 26 September 2012]
The pattern I've come up with looks like this:
$pattern = "/(([0-9]{1,2})(st|nd|rd|th)\sMeeting\s-\s)?(.*)(\[([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\sto\s([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\])$/"
This seems to work fine.
However, I would like the date portion at the end to be optional. BUT, when I add a ? after the dates grouping, preg_match no longer pulls out the dates if they are in the string. I suspect that the .* is taking over, but I can't seem to get it
Upvotes: 0
Views: 78
Reputation: 4906
This little changes will do it (bold)
/(([0-9]{1,2})(st|nd|rd|th)\sMeeting\s-\s)?
(.*?)
(\[([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\sto\s([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\]
|$
)$/
first the freetext expression is extended by a ?
to make it ungreedy (see the other posts)
than |$
is appended to the date part to tell it to be exactly the date or the end of the string.
Here's your total regex
/(([0-9]{1,2})(st|nd|rd|th)\sMeeting\s-\s)?(.*?)(\[([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\sto\s([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\]|$)$/
Upvotes: 0
Reputation: 2621
Just as you presumed, the .*
(greedy quantifier) eats up too much information. This can be solved either by making it lazy or replacing it with something else such as [^[]*
. However, replacing it with the latter suggestion will disallow any use of the literal [
in the string.
What you should also do, besides fixing this issue, is learn to use non-capturing groups for the parts you don't need saved. This will speed up your regexes and save some memory.
Here's my solution to your problem. Not much changed, but I'm sure you can spot the differences.
/(([0-9]{1,2})(st|nd|rd|th)\sMeeting\s-\s)?(.*)(\[([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\sto\s([0-9]{1,2}\s(January|February|March|April|May|June|July|August|September|November|December)\s[0-9]{4})\])?$/
You can view a demo and an explanation to the regular expression here: http://regex101.com/r/vZ1nH6
The website uses PHP so it's accurate to your problem. If you are interested in learning more I suggest you read up on regular expressions over at www.regular-expressions.info and have a look at the quiz over at http://www.regex101.com/quiz/
Upvotes: 0
Reputation: 3225
(.*) --> (.*?)
Read more about lazy quantifiers here:
http://www.regular-expressions.info/repeat.html
Upvotes: 1