Reputation: 9599
How do I, using a regular expression, split this string:
string = "a[a=d b&c[e[100&2=34]]] e[cheese=blue and white] x[a=a b]"
into this array:
string.split( regexp ) =>
[ "a[a=d b&c[e[100&2=34]]]", "e[cheese=blue and white]", "x[a=a b]" ]
The basic rule is that string should be split at whitespace ( \s ), unless whitespace exists inside brackets( [ ] );
Upvotes: 1
Views: 357
Reputation: 4772
If the rule is this simple, I would suggest just doing it manually. Step through each character and keep track of your nesting level by increasing by 1 for each [ and decreasing by 1 for each ]. If you reach a space with nesting == 0 then split.
Edit: I was thinking that I might also mention that there are other pattern matching facilities in some languages that do natively support this sort of thing. For example, in Lua you can use '%b[]' to match balanced nested []'s. (Of course, Lua doesn't have a built in split function....)
Upvotes: 4
Reputation: 10772
Another is a looping approach where you deconstruct the nested brackets one level at a time, else it's hard(TM) to ensure your single regexp will work as expected.
Here's an example in ruby:
str = "a[a=d b&c[e[100&2=34]]] e[cheese=blue and white] x[a=a b]"
left = str.dup
tokn=0
toks=[]
# Deconstruct
loop do
left.sub!(/\[[^\]\[]*\]/,"\{#{tokn}\}")
break if $~.nil?
toks[tokn]=$&
tokn+=1
end
left=left.split(/\s+/)
# Reconstruct
(toks.size-1).downto(0) do |tokn|
left.each { |str| str.sub!("\{#{tokn}\}", toks[tokn]) }
end
The above uses {n} where n is an integer during the deconstruction, so in some cases original input like this in the string would break the reconstruction. This should illustrate the approach though.
Writing code that does the split by iterating through the characters is simpler and safer though.
Example in ruby:
str = "a[a=d b&c[e[100&2=34]]] e[cheese=blue and white] x[a=a b]"
toks=[]
level=st=en=0;
str.each_byte do |c|
en+=1;
level+=1 if c=='['[0];
level-=1 if c==']'[0];
if level==0 && c==' '[0]
toks.push(str[st,en-1-st]);
st=en
end
end
toks.push(str[st,en-st]) if st!=en
p toks
Upvotes: 0
Reputation:
could you split on "(?<=])\s(?=[a-z][)"? that is, a space preceeded by a ] and followed by a letter and a [? This assumes you never have any string inside brackets like "a[b=d[x=y b] g[w=v b]]"
Upvotes: 0
Reputation: 328774
You can't; regular expressions are based on state machines which don't have a "stack" so you can remember the number of nesting levels.
But maybe you can use a trick: Try to convert the string into a valid JSON string. Then you can use eval()
to parse it into a JavaScript object.
Upvotes: 5