Reputation: 5046
I need to do some stuff with codepoints and a newline. I have a function that takes a char
's codepoint, and if it is \r
it needs to behave differently. I've got this:
if (codePoint == Character.codePointAt(new char[] {'\r'}, 0)) {
but that is very ugly and certainly not the right way to do it. What is the correct method of doing this?
(I know that I could hardcode the number 13
(decimal identifier for \r
) and use that, but doing that would make it unclear what I am doing...)
Upvotes: 4
Views: 1149
Reputation: 16359
I know this question is old, but neither of the existing answers actually answers the question, including the accepted answer.
You can simply compare a code point with a char directly.
if (codePoint == '\r')
Upvotes: 0
Reputation: 1500385
If you know that all your input is going to be in the Basic Multilingual Plane (U+0000 to U+FFFF) then you can just use:
char character = 'x';
int codePoint = character;
That uses the implicit conversion from char
to int
, as specified in JLS 5.1.2:
19 specific conversions on primitive types are called the widening primitive conversions:
- ...
char
toint
,long
,float
, ordouble
...
A widening conversion of a char to an integral type T zero-extends the representation of the char value to fill the wider format.
However, a char
is only a UTF-16 code unit. The point of Character.codePointAt
is that it copes with code points outside the BMP, which are composed of a surrogate pair - two UTF-16 code units which join together to make a single character.
From JLS 3.1:
The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.
If you need to be able to cope with that more complicated situation, you'll need the more complicated code.
Upvotes: 6
Reputation: 201439
If I understand your question, you could simply cast the char
to an int
, something like this
char ch = '\r';
int codePoint = (int) ch;
System.out.println(codePoint);
Output is
13
Upvotes: 4