Jess
Jess

Reputation: 8700

Complex-ish regex problem

I need to parse out writeln("test"); from a string.
I was using (?<type>writeln)\((?<args>[^\)]*)\); as the regex, but this isn't perfect, if you try and parse writeln("heloo :)"); or something similar, the regex won't parse it (because of the ')' in the quotes). Is there a way to register that since the ')' is in the quote marks, the regex should ignore it, and look for the next ')'?

Thanks,
Max

Upvotes: 1

Views: 103

Answers (3)

Peet Brits
Peet Brits

Reputation: 3265

The following will match patterns like writeln("hello :) \"world\"!");

string regex = "(?<type>writeln)\\(\"(?<args>(\\\\\"|[^\"])*)\"\\);";

I'm assuming this is only for single arguments.

Upvotes: 1

Daren Thomas
Daren Thomas

Reputation: 70344

Why not write a little parser for this? Just loop through the characters and have a simple state machine for parsing.

This kind of problem is hard to do in regular expressions since the problem (grammar) is not regular. Look up on parsing HTML with regex in SO ;)

BUT: If you control your input to a certain extent, then you might just be able to get away with regexes. See other answers here for "good enough" ways to do it.

This basically boils down to:

  1. decide how deep the rabbit hole goes (how much "recursion" you want to simulate)
  2. create an alternative (branch) regex for each such recursion
  3. stab your eyes out the next time you need to change regex

I do this all the time. And I hate myself for it!

Upvotes: 2

cletus
cletus

Reputation: 625347

You've encountered the sort of problem you get using regexes to parse non-regular languages.

That being said, try:

(?<type>writeln)\((?<args>("[^"]*"|))\);

It's not perfect but nothing will be.

Upvotes: 1

Related Questions