Reputation: 3539
I need a tokenizer that given a string with arbitrary white-space among words will create an array of words without empty sub-strings.
For example, given a string:
" I dont know what you mean by glory Alice said."
I use:
str2.split(" ")
This also returns empty sub-strings:
["", "I", "dont", "know", "what", "you", "mean", "by", "glory", "", "Alice", "said."]
How to filter out empty strings from an array?
Upvotes: 13
Views: 29830
Reputation: 3
i think empty sub-string happen because there are multiple white-spaces you can use a replace() in a for loop to replace multiple white-spaces with a single white-space then split() to split the program using a single white space like this:
// getting full program from div
var program = document.getElementById("ans").textContent;
//removing multiple spaces
var res = program.replace(" ", " ");
for (i = 0; i <= program.length; i++) {
var res = res.replace(" ", " ");
}
// spliting each word using space as saperator
var result = res.split(" ");
Upvotes: 0
Reputation: 154828
I recommend .match
:
str.match(/\b\w+\b/g);
This matches words between word boundaries, so all spaces are not matched and thus not included in the resulting array.
Upvotes: 2
Reputation: 214949
str.match(/\S+/g)
returns a list of non-space sequences ["I", "dont", "know", "what", "you", "mean", "by", "glory", "Alice", "said."]
(note that this includes the dot in "said.")
str.match(/\w+/g)
returns a list of all words: ["I", "dont", "know", "what", "you", "mean", "by", "glory", "Alice", "said"]
docs on match()
Upvotes: 11
Reputation: 258
You should trim the string before using split.
var str = " I dont know what you mean by glory Alice said."
var trimmed = str.replace(/^\s+|\s+$/g, '');
trimmed = str.split(" ")
Upvotes: 7
Reputation: 21443
see the filter
method
http://www.hunlock.com/blogs/Mastering_Javascript_Arrays#quickIDX13
Upvotes: 0
Reputation: 44215
You probably don't even need to filter, just split using this Regular Expression:
" I dont know what you mean by glory Alice said.".split(/\b\s+/)
Upvotes: 18