Rob P.
Rob P.

Reputation: 15071

Remove Everything Between Two Characters As Long As They Aren't Inside Some Other Characters

Basically, my goal is to remove everything inside ()'s except for strings that are inside "".

I was following the code here: Remove text in-between delimiters in a string (using a regex?)

And that works great; but I have the additional requirement of not removing ()s if they are in "". Is that something that can be done with a regular expression. I feel like I'm dangerously close to needing another approach like a true parser.

This is the what I've been using....

string RemoveBetween(string s, char begin, char end)
{
    Regex regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
    return regex.Replace(s, string.Empty);
}

Upvotes: 5

Views: 1572

Answers (3)

Mark Sowul
Mark Sowul

Reputation: 10600

.NET regexes are even more powerful than the usual and you can surely do what you want. Take a look at this, which looks for balanced parentheses, which is essentially the same problem as yours but with parentheses and not quotes.

http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

Upvotes: 3

Bohemian
Bohemian

Reputation: 425013

I don't speak C, but here's the java implementation:

input.replaceAll("(?<=\\().*?(?=[\"()])(\"([^\"]*)\")?.*(?=\\))", "$2");

This produces the following results:

"foo (bar \"hello world\" foo) bar" --> "foo (hello world) bar"
"foo (bar foo) bar" --> "foo () bar"

It wasn't clear whether you wanted to preserve the quotes - if you did, use $1 instead of $2

Now that you've got the working regex, you should be able to make it work for you in C.

Upvotes: 3

Andrew Shepherd
Andrew Shepherd

Reputation: 45252

It's risky to say "No you can't" on this forum, because somebody will go and ruin it by providing a working answer. :-)

But I will say that this would be really stretching regular expressions, and your problem elegantly lends itself to Automata-based programming.

Personally, I'm happier maintaining a 20-line finite state machine then a 10 character regular expression.

Upvotes: 2

Related Questions