Reputation: 971
I'm trying to match a sequence of separators but negate whenever an hyphen as a character before and after as such:
For example [\u002D\u0020]
will match all spaces and hyphens.
I have wi-fi
However, I want wi-fi to not be a match since it has a letter character before and after. (e.g. \w+\u002D\w+
)
How do I negate a sequence while matching a character set? Also, is \w limited to latin letter characters? Is the engine aware of the cultures, arabic and turkish for example ?
EDIT: Just to explain further what I'm trying to achieve. I want to collect all punctuation and specific characters from a sentence and ignore all words (e.g. -+#$%, etc).
Whenever there's an hyphenated word (e.g. state-of-the-art) I wish to ignore the whole word. "this is# a %state-of-the-art design" I intend to get the following collection: "#, %".
Upvotes: 2
Views: 701
Reputation: 626950
Try matching all hyphenated words and match and capture non-word chars in all other contexts using XRegExp
:
var s = "this is# a %statè-òf-thè-árt or state-of-the-art design";
var rx = XRegExp("\\p{L}+(?:-\\p{L}+)+|([^\\p{L}\\p{N}_ ])","g");
var res = [];
XRegExp.forEach(s, rx, function(match, i) {
if (match[1]) res.push(match[1]);
});
console.log(res);
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-all-min.js"></script>
The pattern matches:
\\p{L}+(?:-\\p{L}+)+
one or more letters (\\p{L}+
) followed with 1 or more sequences of -
and 1+ letters again |
- or ([^\\p{L}\\p{N}_ ])
- Group 1 capturing one char other than space, _
, letters (\\p{L}
) and digits (\\p{N}
). Only the contents of Group 1 should be pushed to the resulting array.
Upvotes: 1