Reputation: 95598
I have a string that looks like this:
"#Text() #SomeMoreText() #TextThatContainsDelimiter(#blah) #SomethingElse()"
I'd like to get back
[#Text(), #SomeMoreText(), #TextThatContainsDelimiter(#blah), #SomethingElse()]
One way I thought about doing this was to require that the #
to be escaped into \#
, which makes the input string:
"#Text() #SomeMoreText() #TextThatContainsDelimiter(\#blah) #SomethingElse()"
I can then split it using /[^\\]#/
which gives me:
[#Text(), SomeMoreText, TextThatContainsDelimiter(\#blah), SomethingElse()]
The first element will contain #
but I can strip it out. However, is there a cleaner way to do this without having to escape the #
, and which ensures that the first element will not contain a #
? Basically I'd like it to split by #
only if the #
is not enclosed by parentheses.
My hunch is that since the #
is context-sensitive and and regular expressions are only suited for context-free strings, this may not be the right tool. If so, would I have to write a grammar for this and roll my own parser/lexer?
Upvotes: 2
Views: 392
Reputation: 75272
From your example, it looks like you want to split on whitespace that's immediately followed by a hash symbol:
/\s+(?=#)/
That leaves the leading #
on all the tokens, but you won't need to treat the first token specially. You could also use this:
/(?:^|\s+)#/
That would strip the hash symbols at the cost of generating an empty string as the first token. But some languages provide a way to discard empty leading tokens. Note that JavaScript does support lookaheads, just not lookbehinds.
Upvotes: 2
Reputation: 354864
Argh! I tend to lose my abilities here. The regex (?<!\()(?=#)
works
PS Home:\> $s -split '(?<!\()(?=#)'
#Text()
#SomeMoreText()
#TextThatContainsDelimiter(#blah)
#SomethingElse()
This combines a negative lookbehind (to make sure there isn't an opening parenthesis preceding the #
) and a positive lookahead to look for the #
.
Upvotes: 2