Reputation: 20215
I have done some searching, but I couldn't find a definitive list of whitespace characters included in the \s
in JavaScript's regex.
I know that I can rely on space, line feed, carriage return, and tab as being whitespace, but I thought that since JavaScript was traditionally only for the browser, maybe URL encoded whitespace and things like
and %20
would be supported as well.
What exactly is considered by JavaScript's regex compiler? If there are differences between browsers, I only really care about webkit browsers, but it would be nice to know of any differences. Also, what about Node.js?
Upvotes: 11
Views: 13967
Reputation: 154838
A simple test:
for(var i = 0; i < 1114111; i++) {
if(String.fromCodePoint(i).replace(/\s+/, "") == "") console.log(i);
}
The char codes (Chrome):
9
10
11
12
13
32
160
5760
8192
8193
8194
8195
8196
8197
8198
8199
8200
8201
8202
8232
8233
8239
8287
12288
65279
Upvotes: 11
Reputation: 33730
Here's an expansion of primvdb's answer, covering the entire 16-bit space, including unicode code point values and a comparison with str.trim(). I tried to edit the answer to improve it, but my edit was rejected, so I had to post this new one.
Identify all single-byte characters which will be matched as whitespace regex \s
or by String.prototype.trim()
:
const regexList = [];
const trimList = [];
for (let codePoint = 0; codePoint < 2 ** 16; codePoint += 1) {
const str = String.fromCodePoint(codePoint);
const unicode = codePoint.toString(16).padStart(4, '0');
if (str.replace(/\s/, '') === '') regexList.push([codePoint, unicode]);
if (str.trim() === '') trimList.push([codePoint, unicode]);
}
const identical = JSON.stringify(regexList) === JSON.stringify(trimList);
const list = regexList.reduce((str, [codePoint, unicode]) => `${str}${unicode} ${codePoint}\n`, '');
console.log({identical});
console.log(list);
The list (in V8):
0009 9
000a 10
000b 11
000c 12
000d 13
0020 32
00a0 160
1680 5760
2000 8192
2001 8193
2002 8194
2003 8195
2004 8196
2005 8197
2006 8198
2007 8199
2008 8200
2009 8201
200a 8202
2028 8232
2029 8233
202f 8239
205f 8287
3000 12288
feff 65279
Upvotes: 3
Reputation: 3414
In Firefox \s - matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v\u00A0\u2028\u2029].
For example, /\s\w*/ matches ' bar' in "foo bar."
https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
Upvotes: 1
Reputation: 16597
HTML != Javascript. Javascript is completely literal, %20 is %20 and
is a string of characters & n b s p and ;. For character classes I consider nearly every that is RegEx in perl to be applicable in JS (you can't do named groups etc).
http://www.regular-expressions.info/javascript.html is the refernece I use.
Upvotes: 3