Reputation: 1119
I'm trying to find the best approach to deal with an issue I have. I need to be able to extract comments from strings, which are patterned as being a content between brackets on the end of the string. The comment can be a single one, multiple ones, nested ones or combinations of these.
Some examples :
this is a string (with comment)
this is another string (with comment)(and more comment)
this is yet another string (with comment (and some nested comment)
That's the most easy format, fairly easy to separate using the following regex (access VBA)
regex.Pattern = "^([^(]*)(\(.*\))+$"
I get the following correct output, where group1 is the value, and group2 would be the comment
group1: this is a string / group2: (with comment)
group1: this is another string / group2: (with comment)(and more comment)
group1: this is yet another string / group2: (with comment (and some nested comment)
The problem is that in some cases I have arrays, and these should fail. The arrays can either be defined by a comma or by a slash. Pretty straightforward, but the problem is that these tokens can also be used for other purposes. So if a comma or slash is found in the string it's considered an array, unless :
- the token is within the comment
- the slash is part of a fractional number
some examples :
this is string1 with a fractional 1/4 number (with comment)
this is string1 (with a fractional 1/4 in comment)
this is string1 (with comment1) / this is string2 (with comment2)
this is string1 (with some data, seperated by a comma) , this is string2 (with comment3 / comment4)
this is string1 (with a fractional 1/4) / this is string2 (with comment2,comment3)
added examples : first one should fail as it contains an array token (the slash) which is not part of a fractional number. The second one selects too much, as it only should take the last comment instead of the whole string from first to second comment.
this is string1 without comment / this is string2 (with comment2)
This is a string (with subcomment) where only the last should be selected (so this one)
How would I adjust the logic the best so that it fails on repetitions unless the comma or slash is part of the exceptions ? I end up with monstercode so would like to see if there are easier options available. So above exceptions should end up as follows :
ex1 / group1 : this is string1 with a fractional 1/4 number group2: (with comment)
ex2 / group1 : this is string1 group2 : (with a fractional 1/4 in comment)
ex3 to 5 should fail as they are considered arrays and need some additional logic
Hope it's a bit clear..
Upvotes: 0
Views: 153
Reputation: 174806
I think you want something like this,
^((?:(?!\)\s*[,\/]).)*?)(\([^()]*\))$
Update:
^(?=(?:(?!\)\s*[,\/]|\s\/\s).)*$)(.*?)((?:\([^()\n]*\))+)$
Upvotes: 1