Reputation: 76067
This is better explained with an example. I want to achieve an split like this:
two-separate-tokens-this--is--just--one--token-another
->
["two", "separate", "tokens", "this--is--just--one--token", "another"]
I naively tried str.split(/-(?!-)/)
and it won't match the first occurrence of double delimiters, but it will match the second (as it is not followed by the delimiter):
["two", "separate", "tokens", "this-", "is-", "just-", "one-", "token", "another"]
Do I have a better alternative than looping through the string?
By the way, the next step should be replacing the two consecutive delimiters by just one, so it's kind of escaping the delimiter by repeating it... So the final result would be this:
["two", "separate", "tokens", "this-is-just-one-token", "another"]
If that can be achieved in just one step, that should be really awesome!
Upvotes: 6
Views: 2335
Reputation: 43673
str.match(/(?!-)(.*?[^\-])(?=(?:-(?!-)|$))/g);
Check this fiddle.
Explanation:
Non-greedy pattern (?!-)(.*?[^\-])
match a string that does not start and does not end with dash character and pattern (?=(?:-(?!-)|$))
requires such match to be followed by single dash character or by end of line. Modifier /g
forces function match
to find all occurrences, not just a single (first) one.
Edit (based on OP's comment):
str.match(/(?:[^\-]|--)+/g);
Check this fiddle.
Explanation:
Pattern (?:[^\-]|--)
will match non-dash character or double-dash string. Sign +
says that such matching from the previous pattern should be multiplied as many times as can. Modifier /g
forces function match
to find all occurrences, not just a single (first) one.
Note:
Pattern /(?:[^-]|--)+/g
works in Javascript as well, but JSLint requires to escape -
inside of square brackets, otherwise it comes with error.
Upvotes: 8
Reputation: 75232
@Ωmega has the right idea in using match
instead of split
, but his regex is more complicated than it needs to be. Try this one:
s.match(/[^-]+(?:--[^-]+)*/g);
It reads exactly the way you expect it to work: Consume one or more non-hyphens, and if you encounter a double hyphen, consume that and go on consuming non-hyphens. Repeat as necessary.
EDIT: Apparently the source string may contain runs of two or more consecutive hyphens, which should not be treated as delimiters. That can be handled by adding a +
to the second hyphen:
s.match(/[^-]+(?:--+[^-]+)*/g);
You can also use a {min,max}
quantifier:
s.match(/[^-]+(?:-{2,}[^-]+)*/g);
Upvotes: 2
Reputation: 76067
Given that the regular expressions weren't very good with edge cases (like 5 consecutive delimiters) and I had to deal with replacing the double delimiters with a single one (and then again it would get tricky because '----'.replace('--', '-')
gives '---'
rather than '--'
)
I wrote a function that loops over the characters and does everything in one go (although I'm concerned that using the string accumulator can be slow :-s)
f = function(id, delim) {
var result = [];
var acc = '';
var i = 0;
while(i < id.length) {
if (id[i] == delim) {
if (id[i+1] == delim) {
acc += delim;
i++;
} else {
result.push(acc);
acc = '';
}
} else {
acc += id[i];
}
i++;
}
if (acc != '') {
result.push(acc);
}
return result;
}
and some tests:
> f('a-b--', '-')
["a", "b-"]
> f('a-b---', '-')
["a", "b-"]
> f('a-b---c', '-')
["a", "b-", "c"]
> f('a-b----c', '-')
["a", "b--c"]
> f('a-b----c-', '-')
["a", "b--c"]
> f('a-b----c-d', '-')
["a", "b--c", "d"]
> f('a-b-----c-d', '-')
["a", "b--", "c", "d"]
(If the last token is empty, it's meant to be skipped)
Upvotes: 0
Reputation: 10738
You can achieve this without negative lookbehind (as @jbabey mentioned these are not supported in JS) like that (inspired by this article):
\b-\b
Upvotes: 0
Reputation: 707376
I don't know how to do it purely with the regex engine in JS. You could do it this way that is a little less involved than manually parsing:
var str = "two-separate-tokens-this--is--just--one--token-another";
str = str.replace(/--/g, "#!!#");
var split = str.split(/-/);
for (var i = 0; i < split.length; i++) {
split[i] = split[i].replace(/#!!#/g, "--");
}
Working demo: http://jsfiddle.net/jfriend00/hAhAB/
Upvotes: 0
Reputation: 46647
You would need a negative lookbehind assertion as well as your negative lookahead:
(?<!-)-(?!-)
Unfortunately the javascript regular expression parser does not support negative lookbehinds, I believe the only workaround is to inspect your results afterwards and remove any matches that would have failed the lookbehind assertion (or in this case, combine them back into a single match).
Upvotes: 2