RegEx exclude one or another character

Question

I'd like to exclude either one or another character with a RegEx. I have a RegEx that searches the pattern $$([^\[]+$$\=\>(.*).

My problem is the last capture pattern. The string following the > should either be followed by a comma or a right parenthesis.

This is my text: Array([0]=>123,[1]=>Array([a]=>1,[b]=>2)) and I want to get:

// match 1
0 = 0
1 = 123

// match 2
0 = 1
1 = Array([a]=>1,[b]=>2)

This is my RegEx: $$([^\[]+)$$\=\>([^,\)]+)\)? but I get:

// match 1
0 = 0
1 = 123

// match 2
0 = 1
1 = Array([a]=>1

// match 3
0 = b
1 = 2

Krzysztof Kosiński · Accepted Answer

The character class [^,\)] explicitly excludes the comma, so it will never match Array([a]=>1,[b]=>2).

If you are OK with having only one level of nesting, you can try the following: $$([^$$]+)\]=>(Array$[^$]+\)|[^,\)]+)?

If you want to have arbitarily nested definitions of Array, this problem cannot be solved by using regular expressions, because the language you want to parse is not a regular language. You should use a parser generator or write a recursive-descent parser which implements the following grammar:

Start : Array
Array : "Array" "(" ElementList ")"
ElementList : "" | Elements
Elements : Element | Element "," Elements
Element : "[" String "]" "=>" Value
Value : Number | Array
Number : [1-9][0-9]*
String : [^\]]+

Try looking for parser generators for JavaScript. PEG.js is an exmaple: http://pegjs.majda.cz/

RegEx exclude one or another character

Answers (2)

Related Questions