user904567
user904567

Reputation: 67

Get string pattern from this string using Regex

I have a string as shown below in my C# app.

Multiply(Sum(3,5,4), Division(4,5,5), Subtract(7,8,9))  

Sum(), Division(), Subtract() are different different methods inside the Multiple().

Is there any way to get each one seperately like Sum(3,5,4), Division(4,5,5), Substract(7,8,9) and Multiply() using C# Regex methods?

Sum, Division, Substract and Multiply are constant keywords words.

Upvotes: 2

Views: 520

Answers (4)

user557597
user557597

Reputation:

C# should be able to do balanced text via recursion in regular expressions. The only problem is I think it retains the outer match as a whole. To further parse the inner contents (between the parenthesis) needs a recursive function call, picking off the tokens each time.

I agree with @dasblinkenlight though about needing a decent parser. As he says, the complexity can become quickly considerable.

The regex below is from Perl, but the construct's should be the same for .Net hacking.
As you can see, the regex is like a seive in that the general form is adhered to, but
only comma and digits are handled between Math tokens, allowing the rest to fall through.

But, if this is the only thing you care about, then it should work. You'll notice that even though you can parse it into a data structure (as below), to use the structure in an internal way requires yet another recursive "parse" on the data structure (albeit easier). If for display or statistical purposes then its not a problem.

The expanded regex:

 {
    (                                      #1 - Recursion group 1                            
      \b(\w+)\s*                                #2 - Math token
      \(                                        #  - Open parenth                   
         (                                        #3 - Capture between parenth's
           (?:  (?> (?: (?!\b\w+\s*\(|\)) . )+ )     # - Get all up to next math token or close parenth
              | (?1)                                 # - OR, recurse group 1
           )*                                        # - Optionally do many times 
         )                                        # - End capture 3
      \)                                        # - Close parenth
    )                                      # - End recursion group 1
    \s*(\,?)                               #4 - Capture optional comma ','

  |                                    # OR,
                                       # (Here, it is only getting comma and digits, ignoring the rest.
                                       #  Comma's  ',' are escaped to make them standout)
    \s*                                       
    (?|                                    # - Start branch reset
        (\d+)\s*(\,?)                          #5,6 - Digits then optional comma ','
      | (?<=\,)()\s*(\,|\s*$)                  #5,6 - Comma behind. No digit then, comma or end
    )                                      # - End branch reset
 }xs;   # Options: expanded, single-line

Here is a rapid prototype in Perl (easier than C#):

 use Data::Dumper;


#//
 my $regex = qr{(\b(\w+)\s*\(((?:(?>(?:(?!\b\w+\s*\(|\)).)+)|(?1))*)\))\s*(\,?)|\s*(?|(\d+)\s*(\,?)|(?<=\,)()\s*(\,|\s*$))}s;


#//
 my $sample = ', asdf Multiply(9, 4, 3, hello,  _Sum(3,5,4,) , Division(4, Sum(3,5,4), 5), ,, Subtract(7,8,9))';

 print_math_toks( 0, $sample );

 my @array;
 store_math_toks( 0, $sample, \@array );
 print Dumper(\@array);


#//
 sub print_math_toks
 {
    my ($cnt, $segment) = @_;
    while ($segment  =~ /$regex/g )
    {
      if (defined $5) {
         next if $cnt < 1;
         print "\t"x($cnt+1), "$5$6\n";
      }
      else {
         ++$cnt;
         print "\t"x$cnt, "$2(\n";
         my $post = $4;

         $cnt = print_math_toks( $cnt, $3 );

         print "\t"x$cnt, ")$post\n";
         --$cnt;
      }
    }
    return $cnt;
 }


 sub store_math_toks
 {
    my ($cnt, $segment, $ary) = @_;
    while ($segment  =~ /$regex/g )
    {
      if (defined $5) {
         next if $cnt < 1;
         if (length $5) {
            push (@$ary, $5);
         }
         else {
            push (@$ary, '');
         }
      }
      else {
         ++$cnt;
         my %hash;
         $hash{$2} = [];
         push (@$ary, \%hash);

         $cnt = store_math_toks( $cnt, $3, $hash{$2} );

         --$cnt;
      }
    }
    return $cnt;
 }

Output:

        Multiply(
                9,
                4,
                3,
                _Sum(
                        3,
                        5,
                        4,

                ),
                Division(
                        4,
                        Sum(
                                3,
                                5,
                                4
                        ),
                        5
                ),
                ,
                ,
                Subtract(
                        7,
                        8,
                        9
                )
        )
$VAR1 = [
          {
            'Multiply' => [
                            '9',
                            '4',
                            '3',
                            {
                              '_Sum' => [
                                          '3',
                                          '5',
                                          '4',
                                          ''
                                        ]
                            },
                            {
                              'Division' => [
                                              '4',
                                              {
                                                'Sum' => [
                                                           '3',
                                                           '5',
                                                           '4'
                                                         ]
                                              },
                                              '5'
                                            ]
                            },
                            '',
                            '',
                            {
                              'Subtract' => [
                                              '7',
                                              '8',
                                              '9'
                                            ]
                            }
                          ]
          }
        ];

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 727077

You cannot do arbitrary nesting with RegExp - it is impossible even theoretically because of the limitations of RegExp model.

What you need in this case is a parser. It does not require much work to build a very simple recursive descent parser manually, but once the complexity becomes considerable, you should switch to a parser generator. My personal favorite is ANTLR, but you have lots of other choices.

Upvotes: 1

Paul Eastlund
Paul Eastlund

Reputation: 6943

If the nesting is arbitrarily deep you should do this iteratively with something like Regexp.Matches() and Regexp.Replace().

Make a copy of your whole string. Use ([a-zA-Z]+\([0-9, ]*\))(, )? as the regular expression. That will match all of the lowest-level function calls -- all of the leaf nodes of your call graph.

Call Regexp.Matches to extract all of the matches, call Regexp.Replace to get rid of them all from the string copy. That will get rid of all the leaf nodes of the call graph. Call Matches() and Replace() again to get rid of the next level of calls up, and keep repeating until the string copy is empty.

Upvotes: 1

shift66
shift66

Reputation: 11958

Yes if you don't use another method call when passing parameter to your methods.
(like Sum(2, Sum(3,2), 4))
In that case you can use this pattern:
^\w+\((.*)\)$ then get group 1 (it's the (.*) group) which are parameters (Sum(3,5,4), Division(4,5,5), Subtract(7,8,9)) and then use this pattern for getted group to find all parameters:
\w+\(.*\)

If your Multiply method may have another nested methods, regexp cant help you.In that case you should count braces to see which wher was closed

Upvotes: 0

Related Questions