alex
alex

Reputation: 7601

Splitting a string into an array of n words

I'm trying to turn this:

"This is a test this is a test"

into this:

["This is a", "test this is", "a test"]

I tried this:

const re = /\b[\w']+(?:[^\w\n]+[\w']+){0,2}\b/
const wordList = sample.split(re)
console.log(wordList)

But I got this:

[ '',
  ' ',
  ' ']

Why is this?

(The rule is to split the string every N words.)

Upvotes: 8

Views: 2917

Answers (5)

Toto
Toto

Reputation: 91415

You could split like that:

var str = 'This is a test this is a test';
var wrd = str.split(/((?:\w+\s+){1,3})/);
console.log(wrd);

But, you have to delete empty elements from the array.

Upvotes: 1

Pranav C Balan
Pranav C Balan

Reputation: 115222

The String#split method will split the string by the matched content so it won't include the matched string within the result array.

Use the String#match method with a global flag (g) on your regular expression instead:

var sample="This is a test this is a test"

const re = /\b[\w']+(?:\s+[\w']+){0,2}/g;
const wordList = sample.match(re);
console.log(wordList);

Regex explanation here.

Upvotes: 11

Tala
Tala

Reputation: 927

Use whitespace special character (\s) and match function instead of split:

var wordList = sample.text().match(/\s?(?:\w+\s?){1,3}/g);

Split breaks string where regex matches. Match returns whatever that is matched.

Check this fiddle.

Upvotes: 1

Rajesh
Rajesh

Reputation: 24915

As an alternate approach, you can split string by space and the merge chunks in batch.

function splitByWordCount(str, count) {
  var arr = str.split(' ')
  var r = [];
  while (arr.length) {
    r.push(arr.splice(0, count).join(' '))
  }
  return r;
}

var a = "This is a test this is a test";
console.log(splitByWordCount(a, 3))
console.log(splitByWordCount(a, 2))

Upvotes: 8

RizkiDPrast
RizkiDPrast

Reputation: 1725

your code is good to go. but not with split. split will treat it as a delimitor. for instance something like this:

var arr = "1, 1, 1, 1";
arr.split(',') === [1, 1, 1, 1] ;
//but 
arr.split(1) === [', ', ', ', ', ', ', '];

Instead use match or exec. like this

var x = "This is a test this is a test";
var re = /\b[\w']+(?:[^\w\n]+[\w']+){0,2}\b/g
var y = x.match(re);
console.log(y);

Upvotes: 4

Related Questions