Reputation: 416
Is there a way to split a string based on several separators while keeping some of the separators in the splitted array?
So if I have the string "This is a-weird string,right?"
I would like to get
["This", "is", "a", "-", "weird", "string", ",", "right", "?"]
I have tried using string.split(/([^a-zA-Z])/g)
, but I don't want to keep the whitespace. This guide seems like being something I can use, but my understanding of regex is not good enough to know how to mix those two.
Upvotes: 3
Views: 553
Reputation: 5828
Try like this:
const str = "This is a-weird string,right?";
var arr = str.replace(/(\S)([\,\-])/g, "$1 $2").replace(/([\,\-])(\S)/g, "$1 $2").split(" ");
console.log(arr);
You can replace using each delimiter you're interested in so that it has a space on each side, then use that to split and return an array.
Upvotes: 1
Reputation: 626691
You can use
console.log("This is a-weird string,right?".match(/[^\W_]+|[^\w\s]|_/g))
The regex matches:
[^\W_]+
- one or more alphanumeric chars|
- or[^\w\s]
- any char other than word and whitespace|
- or_
- an underscore.See the regex demo.
A fully Unicode aware regex will be
console.log("This is ą-węird string,right?".match(/[\p{L}\p{M}\p{N}]+|[\p{P}\p{S}]/gu))
Here,
[\p{L}\p{M}\p{N}]+
- one or more Unicode letters, diacritics or digits|
- or[\p{P}\p{S}]
- a single punctuation proper or symbol char.See this regex demo.
Upvotes: 4
Reputation: 520878
Here is a regex splitting approach. We can try splitting on the following pattern:
\s+|(?<=\w)(?=\W)|(?<=\W)(?=\w)
Code snippet:
var input = "This is a-weird string,right?";
var parts = input.split(/\s+|(?<=\w)(?=\W)|(?<=\W)(?=\w)/);
console.log(parts);
Here is an explanation of the regex pattern used, which says to split on:
\s+ whitespace
| OR
(?<=\w)(?=\W) the boundary between a word character preceding and non word
character following
| OR
(?<=\W)(?=\w) the boundary between a non word character preceding and word
character following
Upvotes: 2