Reputation: 6970
what's the regex for get all match about:
IF(.....);
I need to get the start and the end of the previous string: the content can be also (
and )
and then can be other (... IF (...) ....)
I need ONLY content inside IF.
Any idea ?
That's because, I need to get an Excel formula (if condition) and transforms it to another language (java script).
EDIT:
i tried
`/IF\s*(\(\s*.+?\s*\))/i or /IF(\(.+?\))/`
this doesn't work because it match only if there aren't )
or (
inside 'IF(...)'
Upvotes: 3
Views: 126
Reputation: 1475
It's not possible only using regular expressions. If you are or can use .NET you should look in to using Balanced Matching.
Upvotes: 0
Reputation:
This is one way to do it in Perl. Any regex flavor that allows recursion
should have this capability.
In this example, the fact that the correct parenthesis are annotated
(see the output) and balanced, means its possible to store the data
in a structured way.
This in no way validates anything, its just a quick solution.
use strict;
use warnings;
##
$/ = undef;
my $str = <DATA>;
my ($lvl, $keyword) = ( 0, '(?:IF|ELSIF)' ); # One or more keywords
# (using 2 in this example)
my $kwrx = qr/
(\b $keyword \s*) #1 - keword capture group
( #2 - recursion group
\( # literal '('
( #3 - content capture group
(?:
(?> [^()]+ ) # any non parenth char
| (?2) # or, recurse group 2
)*
)
\) # literal ')'
)
| ( (?:(?!\b $keyword \s*).)+ ) #4
| ($keyword) #5
/sx;
##
print "\n$str\n- - -\n";
findKeywords ( $str );
exit 0;
##
sub findKeywords
{
my ($str) = @_;
while ($str =~ /$kwrx/g)
{
# Process keyword(s), recurse its contents
if (defined $2) {
print "${1}[";
$lvl++;
findKeywords ( $3 );
}
# Process non-keyword text
elsif (defined $4) {
print "$4";
}
elsif (defined $5) {
print "$5";
}
}
if ($lvl > 0) {
print ']';
$lvl--;
}
}
__DATA__
IF( some junk IF (inner meter(s)) )
THEN {
IF ( its in
here
( IF (a=5)
ELSIF
( b=5
and IF( a=4 or
IF(its Monday) and there are
IF( ('lots') IF( ('of') IF( ('these') ) ) )
)
)
)
then its ok
)
ELSIF ( or here() )
ELSE (or nothing)
}
Output:
IF( some junk IF (inner meter(s)) )
THEN {
IF ( its in
here
( IF (a=5)
ELSIF
( b=5
and IF( a=4 or
IF(its Monday) and there are
IF( ('lots') IF( ('of') IF( ('these') ) ) )
)
)
)
then its ok
)
ELSIF ( or here() )
ELSE (or nothing)
}
- - -
IF[ some junk IF [inner meter(s)] ]
THEN {
IF [ its in
here
( IF [a=5]
ELSIF
[ b=5
and IF[ a=4 or
IF[its Monday] and there are
IF[ ('lots') IF[ ('of') IF[ ('these') ] ] ]
]
]
)
then its ok
]
ELSIF [ or here() ]
ELSE (or nothing)
}
Upvotes: 1
Reputation: 21258
I suspect you have a problewm that is not suitable for regex matching. You want to do unbounded counting (so you can match opening and closing parentheses) and this is more than a regexp can handle. Hand-rolling a parser to do the matching you want shouldn't be hard, though.
Essentially (pseudo-code):
Find "IF"
Ensure next character is "("
Initialise counter parendepth to 1
While parendepth > 0:
place next character in ch
if ch == "(":
parendepth += 1
if ch == ")":
parendepth -= 1
Add in small amounts of "remember start" and "remember end" and you should be all set.
Upvotes: 3
Reputation: 10253
This should work and capture all the text between parentheses, including both parentheses, as the first match:
/IF(\(.+?\))/
Please note that it won't match IF()
(empty parentheses): if you want to match empty parentheses too, you can replace the +
(match one or more) with an *
(match zero or more):
/IF(\(.*?\))/
--- EDIT
If you need to match formulas with parentheses (besides the outmost ones) you can use
/IF(\(.*\))/
which will make the regex "not greedy" by removing the ?
. This way it will match the longest string possible. Sorry, I assumed wrongly that you did not have any sub-parentheses.
Upvotes: 0
Reputation: 78443
Expanding on Paolo's answer, you might also need to worry about spaces and case:
/IF\s*(\(\s*.+?\s*\))/i
Upvotes: 0