THE AMAZING
THE AMAZING

Reputation: 1593

JavaScript break sentence by words

What's a good strategy to get full words into an array with its succeeding character.

Example.

This is an amazing sentence.

Array(
[0] => This 
[1] => is
[2] => an
[3] => amazing
[4] => sentence.
)

Elements 0 - 3 would have a succeeding space, as a period succeeds the 4th element.

I need you to split these by spacing character, Then once width of element with injected array elements reaches X, Break into a new line.

Please, gawd don't give tons of code. I prefer to write my own just tell me how you would do it.

Upvotes: 54

Views: 134974

Answers (9)

Clement
Clement

Reputation: 4297

The following solution splits words, not only by space, but also other types of spaces and punctuation characters. In addition, it works with non ASCII characters.

It matches words by considering only characters that belong to certain categories of characters. It allows letters (L), numbers (N), symbols (S) and marks (M) so it matches quite a broad set but you can adjust if you need a different set of characters. Other categories such as punctuations (P) and separators (Z) are not included and will therefore not match.

input.match(/[\p{L}\p{N}\p{S}\p{M}]+/gu)

Example

' \t a 件数 😀 ,;-asd'.match(/[\p{L}\p{N}\p{S}\p{M}]+/gu)

Returns ['a', '件数', '😀', 'asd']

Upvotes: 4

Maciej Bledkowski
Maciej Bledkowski

Reputation: 703

It can be done with split function:

"This is an amazing sentence.".split(' ')

Upvotes: 1

Penny Liu
Penny Liu

Reputation: 17408

Use split and filter to remove leading and trailing whitespaces.

let str = '     This is an amazing sentence.  ',
  words = str.split(' ').filter(w => w !== '');

console.log(words);

Upvotes: 8

Ravi Rajendra
Ravi Rajendra

Reputation: 718

If you need spaces and the dots the easiest would be.

"This is an amazing sentence.".match(/.*?[\.\s]+?/g);

the result would be

['This ','is ','an ','amazing ','sentence.']

Upvotes: 9

Penny Liu
Penny Liu

Reputation: 17408

This can be done with lodash _.words:

var str = 'This is an amazing sentence.';
console.log(_.words(str, /[^, ]+/g));
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.min.js"></script>

Upvotes: 2

Isaac
Isaac

Reputation: 10834

Similar to Ravi's answer, use match, but use the word boundary \b in the regex to split on word boundaries:

'This is  a test.  This is only a test.'.match(/\b(\w+)\b/g)

yields

["This", "is", "a", "test", "This", "is", "only", "a", "test"]

or

'This is  a test.  This is only a test.'.match(/\b(\w+\W+)/g)

yields

["This ", "is  ", "a ", "test.  ", "This ", "is ", "only ", "a ", "test."]

Upvotes: 80

Carsten Massmann
Carsten Massmann

Reputation: 28196

try this

var words = str.replace(/([ .,;]+)/g,'$1§sep§').split('§sep§');

This will

  1. insert a marker §sep§ after every chosen delimiter [ .,;]+
  2. split the string at the marked positions, thereby preserving the actual delimiters.

Upvotes: 20

doogle
doogle

Reputation: 3406

Here is an option if you wanted to include the space and complete in O(N)

var str = "This is an amazing sentence.";
var words = [];
var buf = "";
for(var i = 0; i < str.length; i++) {
    buf += str[i];
    if(str[i] == " ") {
        words.push(buf);
        buf = "";
    }
}

if(buf.length > 0) {
    words.push(buf);
}

Upvotes: 3

h2ooooooo
h2ooooooo

Reputation: 39532

Just use split:

var str = "This is an amazing sentence.";
var words = str.split(" ");
console.log(words);
//["This", "is", "an", "amazing", "sentence."]

and if you need it with a space, why don't you just do that? (use a loop afterwards)

var str = "This is an amazing sentence.";
var words = str.split(" ");
for (var i = 0; i < words.length - 1; i++) {
    words[i] += " ";
}
console.log(words);
//["This ", "is ", "an ", "amazing ", "sentence."]

Oh, and sleep well!

Upvotes: 69

Related Questions