Reputation: 1816
I have this string
Fully furnished self contained 2 bedroom suite just 5 minute walk to UVIC is available for September 1.
now I'm using a pregmatch to extract it: Here is the regex.
'/\bavailable\\s(?P<date_available>[?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?|immediately]+[\\s\d]+)[st|nd|rd|th]?/i'
Currently this regex can extract from a string:
Available september 1st.
Available September 2nd
available september 3rd
available september 4th
available sept 1
The output example is:
Array
(
[0] => available September 1
[date_available] => September 1
[1] => September 1
)
But I cannot find a way to extract when the strings are:
Available for september 1st.
Available in September 2nd
available since september 3rd
available at september 4th
anyone can help me deal with this? thanks
Upvotes: 0
Views: 104
Reputation: 53
I can't actually get yours to work at all, it looks as though you're trying to use character classes with square brackets [ ]
rather than grouping and alternating with parentheses ( )
.
The following is probably the shortest I can get it based on your requirement
$pattern = '/\bavailable\s+(?:(?:for|in|at|since)\s+)?((?:immediately|now)|(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Oct(?:ober)?|(?:Sept|Nov|Dec)(?:ember)?)\s+?\d{1,2}(?:st|nd|rd|th)?)/i';
This doesn't include the named sub-pattern as the required match will always be in $matches[1]
however if you want to include a named subpattern then you can always put one in.
$pattern = '/\bavailable\s+(?:(?:for|in|at|since)\s+)?(?P<date_available>(?:immediately|now)|(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Oct(?:ober)?|(?:Sept|Nov|Dec)(?:ember)?)\s+?\d{1,2}(?:st|nd|rd|th)?)/i';
In response to @EthanB earlier solution, you don't seem to be capturing the ordinal suffix for the date st, nd, rd, th
, if that's the case, and it's not required then you can make it even shorter by not including that, there's no point in trying to match anything after the day number.
Upvotes: 0
Reputation: 4289
With wildcard A-Z, 2 to 5 letters (matches things like "on"):
$regex = '/\bavailable[ ]*(?:[a-z]{2,5})?[ ]*' .
'(?P<date_available>immediately|now|' .
'(?:(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?' .
'|Apr(?:il)?|May|Jun(?:e)|Jul(?:y)?|Aug(?:ust)?' .
'|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)' .
'[ ]+[\d]+))' .
//end <date_available>
'(?:st|nd|rd|th)?/i';
Usage:
$lines = array(
'Fully furnished self contained 2 bedroom suite just 5 minute walk to UVIC is available now.',
'bedroom suite just 5 minute walk to UVIC is available on September 34.',
'bedroom suite just 5 minute walk to somewhere is available on Apr 1.',
);
foreach ($lines as $line) {
echo $line, "\n<br>\n";
if (preg_match($regex, $line, $matches) === 1) {
print_r($matches['date_available']);
} else {
echo "Does not match.";
}
echo "\n<br>\n";
}
Upvotes: 1
Reputation: 53
The following works with all of your examples, although I haven't put in your 'named sub-patterns' in PHP as I don't know the exact syntax for them
\bavailable\s+(?:(?:for|in|at|since)\s+)?((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sept(?:ember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+\d{1,2}(?:st|nd|rd|th)?)
Upvotes: 0