Senica Gonzalez
Senica Gonzalez

Reputation: 8182

PHP Regular Expression - Repeating Match of a Group

I have a string that may look something like this:

$r = 'Filed under: <a>Group1</a>, <a>Group2</a>';

Here is the regular expression I am using so far:

preg_match_all("/Filed under: (?:<a.*?>([\w|\d|\s]+?)<\/a>)+?/", $r, $matches);

I want the regular expression to inside the () to continue to make matches as designated with the +? at the end. But it just won't do it. ::sigh::

Any ideas. I know there has to be a way to do this in one regular expression instead of breaking it up.

Upvotes: 10

Views: 9961

Answers (4)

Alan Moore
Alan Moore

Reputation: 75232

Just for fun here's a regex that will work with a single preg_match_all:

'%(?:Filed under:\s*+|\G</a>)[^<>]*+<a[^<>]*+>\K[^<>]*%`

Or, in a more readable format:

'%(?:
      Filed under:   # your sentinel string
    |                
      \G             # NEXT MATCH POSITION
      </a>           # an end tag
  )
  [^<>]*+          # some non-tag stuff     
  <a[^<>]*+>       # an opening tag
  \K               # RESET MATCH START
  [^<>]+           # the tag's contents
%x'

\G matches the position where the next match attempt would start, which is usually the spot where the previous successful match ended (but if the previous match was zero-length, it bumps ahead one more). That means the regex won't match a substring starting with </a> until after it's matched one starting with Filed under: at at least once.

After the sentinel string or an end tag has been matched, [^<>]*+<a[^<>]*+> consumes everything up to and including the next start tag. Then \K spoofs the start position so the match (if there is one) appears to start after the <a> tag (it's like a positive lookbehind, but more flexible). Finally, [^<>]+ matches the tag's contents and brings the match position up to the end tag so \G can match.

But, as I said, this is just for fun. If you don't have to do the job in one regex, you're better off with a multi-step approach like the one @codaddict used; it's more readable, more flexible, and more maintainable.

\K reference
\G reference

EDIT: Although the references I gave are for the Perl docs, these features are supported by PHP, too--or, more accurately, by the PCRE lib. I think the Perl docs are a little better, but you can also read about this stuff in the PCRE manual.

Upvotes: 12

codaddict
codaddict

Reputation: 455030

Try:

<?php

$r = 'Filed under: <a>Group1</a>, <a>Group2</a>, <a>Group3</a>, <a>Group4</a>';

if(preg_match_all("/<a.*?>([^<]*?)<\/a>/", $r, $matches)) {
    var_dump($matches[1]); 
}

?>

output:

array(4) {
  [0]=>
  string(6) "Group1"
  [1]=>
  string(6) "Group2"
  [2]=>
  string(6) "Group3"
  [3]=>
  string(6) "Group4"
}

EDIT:

Since you want to include the string 'Filed under' in the search to uniquely identify the match, you can try this, I'm not sure if it can be done using a single call to preg_match

// Since you want to match everything after 'Filed under'
if(preg_match("/Filed under:(.*)$/", $r, $matches)) {
    if(preg_match_all("/<a.*?>([^<]*?)<\/a>/", $matches[1], $matches)) {
        var_dump($matches[1]); 
    }
}

Upvotes: 8

ghostdog74
ghostdog74

Reputation: 342373

$r = 'Filed under: <a>Group1</a>, <a>Group2</a>'
$s = explode("</a>",$r);
foreach ($s as $k){
    if ($k){
        $k=explode("<a>",$k);
        print "$k[1]\n";
    }
}

output

$ php test.php
Group1
Group2

Upvotes: 2

Anon.
Anon.

Reputation: 59983

I want the regular expression to inside the () to continue to make matches as designated with the +? at the end.

+? is a lazy quantifier - it will match as few times as possible. In other words, just once.

If you want to match several times, you want a greedy quantifier - +.

Also note that your regex doesn't quite work - the match fails as soon as it encounters the comma between the tags, because you haven't accounted for it. That likely needs correcting.

Upvotes: 1

Related Questions