Reputation: 29869
Based off Regex Until But Not Including, I'm trying to match all characters up until a word boundary.
For example - matching apple
in the following string:
apple<
I'm doing that using:
[^]
\b
+
repeaterLike this:
/a[^\b]+/
Which should look for an "a" and then grab one or more matches for any character that is not a word boundary. So I would expect it to stop before <
which is at the end of the word
var input = [ "apple<", "apple/" ];
var myRegex = /a[^\b]+/;
for (var i = 0; i < input.length; i++) {
console.log(myRegex.exec(input[i]));
}
Couple other regex strings I tried:
I can use a negated word boundary or a negated set with a regular word boundary:
/a[\B]+/
/a[^\b]+/
I can specify several possible word ending characters and use them in a negated set:
/a[^|"<>\-\\\/;:,.]+/
I can also look for a postive set and just restrict it to return for regular letters:
/a[\w]+/
/a[a-zA-Z]+/
But I'd like to know how to do it for a word boundary if that's possible.
Here's a MDN's listing of word boundary and the characters that it constitutes
Upvotes: 0
Views: 3098
Reputation: 10899
If this rewording of the question is accurate: match all words beginning with 'a', then you might have begun the search with existing SO answers like this one. Distilling that down you could use a character class for a word \w
and to make it a bit more bulletproof by including a preceding word boundary \b
match to prevent matching partial words including an 'a' such as 'baggage': /\ba\w+/gi
var input = [ "apple<", "apple/", "baggage;" ];
var myRegexWord = /\ba\w+/i;
var myRegexPartial = /a\w+/;
for (var i = 0; i < input.length; i++) {
console.log(myRegexWord.exec(input[i]));
console.log(myRegexPartial.exec(input[i]));
}
Upvotes: 1
Reputation: 6561
Word boundaries (\b
) are not characters, but the empty string between a sequence of letters and any non-letter character. Moreover, since Unicode support is still lacking in JavaScript, "letter" mean only ASCII letters.
Because of that, you
\b
unless your data is some kind of computer language that can't possibly include Unicode\b
(an empty string times 10 is still one empty string)\b
(it's not a character set, so it has no complement)\b
in a character set (in square brackets) since, again, it's not a character or character setSince \b
doesn't actually add any characters to the match, you can safely append it to your regex:
/.+?\b/
will match all characters up until the first word boundary. It's in fact a superset of:
/\w+/
which is probably what you want, since you're interested only in the words, not the stuff in between.
Upvotes: 6
Reputation: 30985
You have to include the word boundary as part of your regex like this:
/[A-Za-z]+\b/
You could also use:
\w+\b
Although this will include the underscore as part of your word
Upvotes: 1