Kenneth P.
Kenneth P.

Reputation: 1816

Need to extract a date from a string using pregmatch

I have this string

Fully furnished self contained 2 bedroom suite just 5 minute walk to UVIC is available for September 1.

now I'm using a pregmatch to extract it: Here is the regex.

'/\bavailable\\s(?P<date_available>[?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?|immediately]+[\\s\d]+)[st|nd|rd|th]?/i'

Currently this regex can extract from a string:

Available september 1st.
Available September 2nd
available september 3rd
available september 4th
available sept 1

The output example is:

Array
(
    [0] => available September 1
    [date_available] => September 1
    [1] => September 1
)

But I cannot find a way to extract when the strings are:

Available for september 1st.
Available in September 2nd
available since september 3rd
available at september 4th

anyone can help me deal with this? thanks

Upvotes: 0

Views: 104

Answers (3)

HuggieRich
HuggieRich

Reputation: 53

I can't actually get yours to work at all, it looks as though you're trying to use character classes with square brackets [ ] rather than grouping and alternating with parentheses ( ).

The following is probably the shortest I can get it based on your requirement

$pattern = '/\bavailable\s+(?:(?:for|in|at|since)\s+)?((?:immediately|now)|(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Oct(?:ober)?|(?:Sept|Nov|Dec)(?:ember)?)\s+?\d{1,2}(?:st|nd|rd|th)?)/i';

This doesn't include the named sub-pattern as the required match will always be in $matches[1] however if you want to include a named subpattern then you can always put one in.

$pattern = '/\bavailable\s+(?:(?:for|in|at|since)\s+)?(?P<date_available>(?:immediately|now)|(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Oct(?:ober)?|(?:Sept|Nov|Dec)(?:ember)?)\s+?\d{1,2}(?:st|nd|rd|th)?)/i';

In response to @EthanB earlier solution, you don't seem to be capturing the ordinal suffix for the date st, nd, rd, th, if that's the case, and it's not required then you can make it even shorter by not including that, there's no point in trying to match anything after the day number.

Upvotes: 0

EthanB
EthanB

Reputation: 4289

With wildcard A-Z, 2 to 5 letters (matches things like "on"):

$regex = '/\bavailable[ ]*(?:[a-z]{2,5})?[ ]*' .
    '(?P<date_available>immediately|now|' .
    '(?:(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?' .
    '|Apr(?:il)?|May|Jun(?:e)|Jul(?:y)?|Aug(?:ust)?' .
    '|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)' .
    '[ ]+[\d]+))' .
    //end <date_available>
    '(?:st|nd|rd|th)?/i';

Usage:

$lines = array(
    'Fully furnished self contained 2 bedroom suite just 5 minute walk to UVIC is available now.',
    'bedroom suite just 5 minute walk to UVIC is available on September 34.',
    'bedroom suite just 5 minute walk to somewhere is available on Apr 1.',
    );

foreach ($lines as $line) {
    echo $line, "\n<br>\n";
    if (preg_match($regex, $line, $matches) === 1) {
        print_r($matches['date_available']);
    } else {
        echo "Does not match.";
    }
    echo "\n<br>\n";
}

Upvotes: 1

HuggieRich
HuggieRich

Reputation: 53

The following works with all of your examples, although I haven't put in your 'named sub-patterns' in PHP as I don't know the exact syntax for them

\bavailable\s+(?:(?:for|in|at|since)\s+)?((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sept(?:ember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+\d{1,2}(?:st|nd|rd|th)?)

Upvotes: 0

Related Questions