enfix
enfix

Reputation: 6970

Regular expression problem

what's the regex for get all match about:

IF(.....);

I need to get the start and the end of the previous string: the content can be also ( and ) and then can be other (... IF (...) ....) I need ONLY content inside IF. Any idea ?

That's because, I need to get an Excel formula (if condition) and transforms it to another language (java script).

EDIT:
i tried

       `/IF\s*(\(\s*.+?\s*\))/i or /IF(\(.+?\))/`

this doesn't work because it match only if there aren't ) or ( inside 'IF(...)'

Upvotes: 3

Views: 126

Answers (5)

daalbert
daalbert

Reputation: 1475

It's not possible only using regular expressions. If you are or can use .NET you should look in to using Balanced Matching.

Upvotes: 0

user557597
user557597

Reputation:

This is one way to do it in Perl. Any regex flavor that allows recursion
should have this capability.
In this example, the fact that the correct parenthesis are annotated
(see the output) and balanced, means its possible to store the data
in a structured way.
This in no way validates anything, its just a quick solution.

use strict;
use warnings;

##
 $/ = undef;
 my $str = <DATA>;
 my ($lvl, $keyword) = ( 0, '(?:IF|ELSIF)' ); # One or more keywords
                                              # (using 2 in this example)    
 my $kwrx = qr/
   (\b $keyword \s*)        #1  - keword capture group
   (                        #2  - recursion group
     \(      # literal '('
        (                   #3  - content capture group
          (?:
              (?>  [^()]+ )    # any non parenth char
            | (?2)             # or, recurse group 2
          )*
        )
     \)      # literal ')'
   )
 | ( (?:(?!\b $keyword \s*).)+ )   #4
 | ($keyword)                      #5
 /sx;

##
 print "\n$str\n- - -\n";
 findKeywords ( $str );
 exit 0;

##
sub findKeywords
{
  my ($str) = @_;
  while ($str =~ /$kwrx/g)
  {
    # Process keyword(s), recurse its contents

      if (defined $2) {
        print "${1}[";
        $lvl++;
        findKeywords ( $3 );
      }
    # Process non-keyword text

      elsif (defined $4) {
        print "$4";
      }
      elsif (defined $5) {
         print "$5";
      }
  }
  if ($lvl > 0) {
      print ']';
      $lvl--;
  }
}

__DATA__

  IF( some junk IF (inner meter(s)) )
  THEN {
    IF ( its in
         here
         ( IF (a=5)
           ELSIF
           ( b=5
             and IF( a=4 or
                     IF(its Monday) and there are
                     IF( ('lots') IF( ('of') IF( ('these') ) ) )
                   )
           )
         )
         then its ok
       ) 
    ELSIF ( or here() )
    ELSE (or nothing)
  } 

Output:

  IF( some junk IF (inner meter(s)) )
  THEN {
    IF ( its in
         here
         ( IF (a=5)
           ELSIF
           ( b=5
             and IF( a=4 or
                     IF(its Monday) and there are
                     IF( ('lots') IF( ('of') IF( ('these') ) ) )
                   )
           )
         )
         then its ok
       )
    ELSIF ( or here() )
    ELSE (or nothing)
  }

- - -

  IF[ some junk IF [inner meter(s)] ]
  THEN {
    IF [ its in
         here
         ( IF [a=5]
           ELSIF
           [ b=5
             and IF[ a=4 or
                     IF[its Monday] and there are
                     IF[ ('lots') IF[ ('of') IF[ ('these') ] ] ]
                   ]
           ]
         )
         then its ok
       ]
    ELSIF [ or here() ]
    ELSE (or nothing)
  }

Upvotes: 1

Vatine
Vatine

Reputation: 21258

I suspect you have a problewm that is not suitable for regex matching. You want to do unbounded counting (so you can match opening and closing parentheses) and this is more than a regexp can handle. Hand-rolling a parser to do the matching you want shouldn't be hard, though.

Essentially (pseudo-code):

Find "IF"
Ensure next character is "("
Initialise counter parendepth to 1
While parendepth > 0:
  place next character in ch
  if ch == "(":
    parendepth += 1
  if ch == ")":
    parendepth -= 1

Add in small amounts of "remember start" and "remember end" and you should be all set.

Upvotes: 3

Paolo Stefan
Paolo Stefan

Reputation: 10253

This should work and capture all the text between parentheses, including both parentheses, as the first match:

/IF(\(.+?\))/

Please note that it won't match IF() (empty parentheses): if you want to match empty parentheses too, you can replace the + (match one or more) with an * (match zero or more):

/IF(\(.*?\))/

--- EDIT

If you need to match formulas with parentheses (besides the outmost ones) you can use

 /IF(\(.*\))/

which will make the regex "not greedy" by removing the ?. This way it will match the longest string possible. Sorry, I assumed wrongly that you did not have any sub-parentheses.

Upvotes: 0

Denis de Bernardy
Denis de Bernardy

Reputation: 78443

Expanding on Paolo's answer, you might also need to worry about spaces and case:

/IF\s*(\(\s*.+?\s*\))/i

Upvotes: 0

Related Questions