Reputation: 1390
I was asked to split a string by all chars where Character.IsWhiteSpace = true
.
Does Character.IsWhiteSpace
equivalent to:
c == " " || c == "\t" || c == "\r" || c == "\n"
EDIT:
I'd be glad if you could help think of a neat way to slpit a text by this criteria.
Upvotes: 1
Views: 930
Reputation: 328735
The easiest way to split on such characters is:
String[] words = input.split("\\p{javaWhitespace}+");
It is documented in the Pattern
javadoc:
\p{javaWhitespace}
Equivalent tojava.lang.Character.isWhitespace()
In particular, splitting on \\s
is not equivalent because it will not split on \u001C...\u001F
.
Upvotes: 1
Reputation: 11994
Here you can read more about it.
Taken from there:
Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+001F UNIT SEPARATOR.
Upvotes: 3