Damien
Damien

Reputation: 5882

Combine multiple match regular expression into one and get the matching ones

I have a list of regular expressions:

suresnes|suresne|surenes|surene
pommier|pommiers
^musique$
^(faq|aide)$
^(file )?loss( )?less$
paris
faq                              <<< this match twice

My use case is that each pattern which got a match display a link to my user, so I can have multiple pattern matching.

I test thoses patterns against a simple string of text "live in paris" / "faq" / "pom"...

The simple way to do it is to loop over all the patterns with a preg_match, but I'm will do that a lot on a performance critical page, so this look bad to me.

Here is what I have tried: combining all thoses expressions into one with group names:

preg_match("@(?P<group1>^(faq|aide|todo|paris)$)|(?P<group2>(paris)$)@im", "paris", $groups);

As you can see, each pattern is grouped: (?P<GROUPNAME>PATTERN) and they are all separated by a pipe |.

The result is not what I expect, as only the first group matching is returned. Look like when a match occurs the parsing is stopped.

What I want is the list of all the matching groups. preg_match_all does not help neither.

Thanks!

Upvotes: 4

Views: 7083

Answers (3)

Raheel Hasan
Raheel Hasan

Reputation: 6033

Try this approach:

#/ define input string
$str_1 = "{STRING HERE}";

#/ Define regex array
$reg_arr = array(
'suresnes|suresne|surenes|surene',
'pommier|pommiers',
'^musique$',
'^(faq|aide)$',
'^(file )?loss( )?less$',
'paris',
'faq'
);

#/ define a callback function to process Regex array
function cb_reg($reg_t)
{
    global $str_1;
    if(preg_match("/{$reg_t}/ims", $str_1, $matches)){
    return $matches[1]; //replace regex pattern with the result of matching is the key trick here
    //or return $matches[0]; if you dont want to get captured parenthesized subpatterns
    //or you could return an array of both. its up to you how to do it.
    }else{
    return '';
    }
}

#/ Apply array Regex via much faster function (instead of a loop)
$results = array_map('cb_reg', $reg_arr); //returns regex results
$results = array_diff($results, array('')); //remove empty values returned

Basically, this is the fastest way I could think of.

  1. You can't combine say 100s of Regex into one call, as it would be very complex regex to build and will have several chances to fail matching. This is one of the best way to do it.

  2. In my opinion, combining large number of Regex into 1 regex (if possibly achieved) will be slower to execute with preg_match, as compared to this approach of Callback on Arrays. Just remember, the key here is Callback function on array member values, which is fastest way to handle array for your and similar situation in php.

Also note, The callback on Array is not equal to looping the Array. Looping is slower and has an n from algorithm analysis. But callback on array elements is internal and is very fast as compared.

Upvotes: 1

Erik Aronesty
Erik Aronesty

Reputation: 12945

You can combine all of your regexes with "|" in between them. Then apply this: http://www.rexegg.com/regex-optimizations.html, which will optimize it, collapse common expressions, etc.

Upvotes: 0

Toto
Toto

Reputation: 91518

How about:

preg_match("@(?=(?P<group1>^(faq|aide|todo|paris)$))(?=(?P<group2>(paris)$))@im", "paris", $groups);
print_r($groups);

output:

Array
(
    [0] => 
    [group1] => paris
    [1] => paris
    [2] => paris
    [group2] => paris
    [3] => paris
    [4] => paris
)

The (?= ) is called lookahead

Explanation of the regex:

(?=                                     # start lookahead
    (?P<group1>                         # start named group group1
        ^                               # start of string
            (                           # start catpure group #1
                faq|aide|todo|paris     # match any of faq, aide, todo or paris
            )                           # end capture group #1
        $                               # end of string
    )                                   # end of named group group1
)                                       # end of lookahead
(?=                                     # start lookahead
    (?P<group2>                         # start named group group2
            (                           # start catpure group #2
            paris                       # paris
        )                               # end capture group #2
        $                               # end of string
    )                                   # end of named group group2
)                                       # end of lookahead

Upvotes: 7

Related Questions