Reputation: 15071
Basically, my goal is to remove everything inside ()'s except for strings that are inside "".
I was following the code here: Remove text in-between delimiters in a string (using a regex?)
And that works great; but I have the additional requirement of not removing ()s if they are in "". Is that something that can be done with a regular expression. I feel like I'm dangerously close to needing another approach like a true parser.
This is the what I've been using....
string RemoveBetween(string s, char begin, char end)
{
Regex regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
return regex.Replace(s, string.Empty);
}
Upvotes: 5
Views: 1572
Reputation: 10600
.NET regexes are even more powerful than the usual and you can surely do what you want. Take a look at this, which looks for balanced parentheses, which is essentially the same problem as yours but with parentheses and not quotes.
http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
Upvotes: 3
Reputation: 425013
I don't speak C, but here's the java implementation:
input.replaceAll("(?<=\\().*?(?=[\"()])(\"([^\"]*)\")?.*(?=\\))", "$2");
This produces the following results:
"foo (bar \"hello world\" foo) bar" --> "foo (hello world) bar"
"foo (bar foo) bar" --> "foo () bar"
It wasn't clear whether you wanted to preserve the quotes - if you did, use $1 instead of $2
Now that you've got the working regex, you should be able to make it work for you in C.
Upvotes: 3
Reputation: 45252
It's risky to say "No you can't" on this forum, because somebody will go and ruin it by providing a working answer. :-)
But I will say that this would be really stretching regular expressions, and your problem elegantly lends itself to Automata-based programming.
Personally, I'm happier maintaining a 20-line finite state machine then a 10 character regular expression.
Upvotes: 2