Reputation: 8182
I have a string that may look something like this:
$r = 'Filed under: <a>Group1</a>, <a>Group2</a>';
Here is the regular expression I am using so far:
preg_match_all("/Filed under: (?:<a.*?>([\w|\d|\s]+?)<\/a>)+?/", $r, $matches);
I want the regular expression to inside the ()
to continue to make matches as designated with the +?
at the end. But it just won't do it. ::sigh::
Any ideas. I know there has to be a way to do this in one regular expression instead of breaking it up.
Upvotes: 10
Views: 9961
Reputation: 75232
Just for fun here's a regex that will work with a single preg_match_all
:
'%(?:Filed under:\s*+|\G</a>)[^<>]*+<a[^<>]*+>\K[^<>]*%`
Or, in a more readable format:
'%(?:
Filed under: # your sentinel string
|
\G # NEXT MATCH POSITION
</a> # an end tag
)
[^<>]*+ # some non-tag stuff
<a[^<>]*+> # an opening tag
\K # RESET MATCH START
[^<>]+ # the tag's contents
%x'
\G
matches the position where the next match attempt would start, which is usually the spot where the previous successful match ended (but if the previous match was zero-length, it bumps ahead one more). That means the regex won't match a substring starting with </a>
until after it's matched one starting with Filed under:
at at least once.
After the sentinel string or an end tag has been matched, [^<>]*+<a[^<>]*+>
consumes everything up to and including the next start tag. Then \K
spoofs the start position so the match (if there is one) appears to start after the <a>
tag (it's like a positive lookbehind, but more flexible). Finally, [^<>]+
matches the tag's contents and brings the match position up to the end tag so \G
can match.
But, as I said, this is just for fun. If you don't have to do the job in one regex, you're better off with a multi-step approach like the one @codaddict used; it's more readable, more flexible, and more maintainable.
EDIT: Although the references I gave are for the Perl docs, these features are supported by PHP, too--or, more accurately, by the PCRE lib. I think the Perl docs are a little better, but you can also read about this stuff in the PCRE manual.
Upvotes: 12
Reputation: 455030
Try:
<?php
$r = 'Filed under: <a>Group1</a>, <a>Group2</a>, <a>Group3</a>, <a>Group4</a>';
if(preg_match_all("/<a.*?>([^<]*?)<\/a>/", $r, $matches)) {
var_dump($matches[1]);
}
?>
output:
array(4) {
[0]=>
string(6) "Group1"
[1]=>
string(6) "Group2"
[2]=>
string(6) "Group3"
[3]=>
string(6) "Group4"
}
EDIT:
Since you want to include the string 'Filed under' in the search to uniquely identify the match, you can try this, I'm not sure if it can be done using a single call to preg_match
// Since you want to match everything after 'Filed under'
if(preg_match("/Filed under:(.*)$/", $r, $matches)) {
if(preg_match_all("/<a.*?>([^<]*?)<\/a>/", $matches[1], $matches)) {
var_dump($matches[1]);
}
}
Upvotes: 8
Reputation: 342373
$r = 'Filed under: <a>Group1</a>, <a>Group2</a>'
$s = explode("</a>",$r);
foreach ($s as $k){
if ($k){
$k=explode("<a>",$k);
print "$k[1]\n";
}
}
output
$ php test.php
Group1
Group2
Upvotes: 2
Reputation: 59983
I want the regular expression to inside the () to continue to make matches as designated with the +? at the end.
+?
is a lazy quantifier - it will match as few times as possible. In other words, just once.
If you want to match several times, you want a greedy quantifier - +
.
Also note that your regex doesn't quite work - the match fails as soon as it encounters the comma between the tags, because you haven't accounted for it. That likely needs correcting.
Upvotes: 1