Reputation: 4106
I came across following:
public int indexOf(int ch)
as per http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(int) when I was revising some String related Java concepts.
As per my knowledge, when we use method indexOf()
for java.lang.String
, the parameter is supposed to be char
and hence I was assuming it to be
public int indexOf(char ch)
So, please explain me why is it public int indexOf(int ch)
.
Upvotes: 3
Views: 1476
Reputation: 7324
Every char have an int value that you can use to get that char as well you can convert a chart into int the same way by assigning a char to an int variable try the following lines
char ch = 65;
System.out.println(ch);
int i = 'A';
System.out.println(i);
I'm using char values in a loop and it is allowed just because every char have an int value. try this code it will print out alphabets from A to Z and its equivalent int values
for(char j = 'A'; j <= 'Z'; j++){
System.out.println("int "+((int) j)+" = "+j);
}
Upvotes: 1
Reputation: 100051
Unicode contains many more than 2^16 characters. Java 'char' and 'String' use a Unicode Transformation Format (UTF-16) to represent the full set of characters. Characters in the Base Multilingual Plane are represented as a single 16-bit 'char'. The rest are represented by a surrogate pair: two special 16-bit values from a set reserved for this purpose.
An alternative representation is UTF-32. In this representation, each character is a single 32-bit item, period.
For example, Cuneiform is out there in the SMP; the first character of the block is U+12000. In UTF-32, that's just 0x12000. In UTF-16, it's "\uD808\uDC00"
. Here's some pictures.
The Character
and String
classes (amongst others), provide a few methods that operate on UTF-32 characters for convenience. You're asking about one of them. Whenever you see 'int' as the datatype of a character, that's what the 'int' contains: a UTF-32 value. It's not hard to see how it can be more convenient to do some operations with a single UTF-32 value instead of a pair of surrogates.
Note that this has nothing to do with composed and non-composed accents. á can be represented in Unicode as either one or two UTF-16 characters, but there are no surrogates involved. All three of U+0061 (a), U+00E1 (a with precomposed accent), and U+0301 (composing acute accent) are ordinary BMP characters. So, even in UTF-32, you can have a two-item sequence: U+0061, U+0301.
The ICU4J library provides a more complete set of UTF-32 classes and methods.
Upvotes: 3