Reputation: 6744
I am converting a SMAPI grammar to JSGF. They are pretty similar grammars used in different speech recognition systems. SMAPI uses a question mark they way the rest of the world does, to mean 0 or 1 of the previous thing. JSGF uses square brackets for this. So, I need to convert a string like stuff?
to [stuff]
, and parenthesized strings like ((((stuff)? that)? I)? like)?
to [[[[stuff] that] I] like]
. I have to leave alone strings like ((((stuff) that) I) hate)
. As Qtax pointed out, a more complicated example would be (foo ((bar)? (baz))?)
being replaced by (foo [[bar] (baz)])
.
Because of this, I have to extract every level of a parenthesized expression, see if it ends in a question mark, and replace the parens and question mark with square braces if it does. I think Eric Strom's answer to this question is almost what I need. The problem is that when I use it, it returns the largest matched grouping, whereas I need to do operations on each individual groupings.
This is what I have so far: s/( \( (?: [^()?]* | (?0) )* \) ) \?/[$1]/xg
. When matched with ((((stuff)? that)? I)? like)?
, however, it produces only [((((stuff)? that)? I)? like)]
. Any ideas on how to do this?
I
Upvotes: 2
Views: 204
Reputation: 118625
You'll also want to look at ysth's solution to that question, and use a tool that is already available to solve this problem:
use Text::Balanced qw(extract_bracketed);
$text = '((((stuff)? that)? I)? like)?';
for ($i=0; $i<length($text); $i++) {
($match,$remainder) = extract_bracketed( substr($text,$i), '()' );
if ($match && $remainder =~ /^\?/) {
substr($text,$i) =
'[' . substr($match,1,-1) . ']' . substr($remainder,1);
$i=-1; # fixed
}
}
Upvotes: 4
Reputation: 15194
In older Perl versions (pre 5.10), one could have used code assertions and dynamic regex for this:
...
my $s = '((((stuff)? that)? I)? like)?';
# recursive dynamic regex, we need
# to pre-declare lexical variables
my $rg;
# use a dynamically generated regex (??{..})
# and a code assertion (?{..})
$rg = qr{
(?: # start expression
(?> [^)(]+) # (a) we don't see any (..) => atomic!
| # OR
( # (b) start capturing group for level
\( (??{$rg}) \) \? # oops, we found parentheses \(,\) w/sth
) # in between and the \? at the end
(?{ print "[ $^N ]\n" }) # if we got here, print the captured text $^N
)* # done, repeat expression if possible
}xs;
$s =~ /$rg/;
...
during the match, the code assertion prints all matches, which are:
[ (stuff)? ]
[ ((stuff)? that)? ]
[ (((stuff)? that)? I)? ]
[ ((((stuff)? that)? I)? like)? ]
To use this according to your requirements, you could change the code assertion slightly, put the capturing parentheses at the right place, and save the matches in an array:
...
my @result;
my $rg;
$rg = qr{
(?:
(?> [^)(]+)
|
\( ( (??{$rg}) ) \) \? (?{ push @result, $^N })
)*
}xs;
$s =~ /$rg/ && print map "[$_]\n", @result;
...
which says:
[stuff]
[(stuff)? that]
[((stuff)? that)? I]
[(((stuff)? that)? I)? like]
Regards
rbo
Upvotes: 2
Reputation: 33918
You could solve it in a couple of ways, simplest being just executing your expression till there are no more replacements made. E.g:
1 while s/( \( (?: [^()?]* | (?0) )* \) ) \?/[$1]/xg;
But that is highly inefficient (for deeply nested strings).
You could do it in one pass like this instead:
s{
(?(DEFINE)
(?<r> \( (?: [^()]++ | (?&r) )*+ \) )
)
( \( )
(?= (?: [^()]++ | (?&r) )*+ \) \? )
|
\) \?
}{
$2? '[': ']'
}gex;
Upvotes: 1