Reputation: 75
Take the following string: "Marketing and Cricket on the Internet".
I would like to find all the possible matches for "Ma" -any text- "et" using a regex. So..
The regex Ma.*et
returns "Marketing and Cricket on the Internet". The regex Ma.*?et
returns Market. But I'd like a regex that returns all 3. Is that possible?
Thanks.
Upvotes: 4
Views: 481
Reputation: 75
Thanks guys, that really helped. Here's what I came up with for PHP:
function preg_match_ubergreedy($regex,$text) {
for($i=0;$i<strlen($text);$i++) {
$exp = str_replace("*","{".$i."}",$regex);
preg_match($exp,$text,$matches);
if($matches[0]) {
$matched[] = $matches[0];
}
}
return $matched;
}
$text = "Marketing and Cricket on the Internet";
$matches = preg_match_ubergreedy("@Ma.*?et@is",$text);
Upvotes: 1
Reputation: 12103
For a more general regular expression, another option would be to recursively match the greedy regular expression against the previous match, discarding the first and last characters in turn to ensure that you're matching only a substring of the previous match. After matching Marketing and Cricket on the Internet
, we test both arketing and Cricket on the Internet
and Marketing and Cricket on the Interne
for submatches.
It goes something like this in C#...
public static IEnumerable<Match> SubMatches(Regex r, string input)
{
var result = new List<Match>();
var matches = r.Matches(input);
foreach (Match m in matches)
{
result.Add(m);
if (m.Value.Length > 1)
{
string prefix = m.Value.Substring(0, m.Value.Length - 1);
result.AddRange(SubMatches(r, prefix));
string suffix = m.Value.Substring(1);
result.AddRange(SubMatches(r, suffix));
}
}
return result;
}
This version can, however, end up returning the same submatch several times, for example it would find Marmoset
twice in Marketing and Marmosets on the Internet
, first as a submatch of Marketing and Marmosets on the Internet
, then as a submatch of Marmosets on the Internet
.
Upvotes: 0
Reputation: 45578
As far as I know: No.
But you could match non-greedy first and then generate a new regexp with a quantifier to get the second match. Like this:
Ma.*?et
Ma.{3,}?et
...and so on...
Upvotes: 2
Reputation: 74410
Sadly, this is not possible to do with a standard POSIX regex, which returns a single (best candidate, per regex rules) match. You will need to utilize an extension feature, which may be present in the particular programming language in which you are using this regex, assuming that you are using it in a program, to accomplish this task.
Upvotes: 0