Reputation: 1877
(I saw this topic has a LOT of answers but I couldn't find one that fits)
I am writing a little parser in javascript that would cut the text into sections like this :
var tex = "hello this :word is apart"
var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]
the perfect regex for this is :
/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g
But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...
I considered :
Understand, I NEED words AND ALL spaces, and to exclude some words. I am open in other methods, like not using regex.
removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.
would that work in javascript, and be reliable ?
I tried
/(((:[a-z]+)|([ ]+)|([a-z]*))/g
in https://regexr.com/ seems to work, will it work in every case ?
Upvotes: 0
Views: 81
Reputation: 10930
I would use 2 regexes, first one matches the Words, you DON'T want and then replace
them with an empty string
, this is the simple regex:
/:\w+/g
Then replace
with an empty string
. Now you have a string, that can be parsed with this regex:
/([ ]+)|([a-z]*)/g
which is a simplified version of your second regex, since forbidden Words are already gone.
Upvotes: 1
Reputation: 3604
You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.
const text = 'hello this :word is apart';
const regex = /(\w+)|(:\w+)|(\s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));
console.log(parsed);
Upvotes: 1