Vikram
Vikram

Reputation: 4106

Java - public int indexOf(int ch)

I came across following:

public int indexOf(int ch)

as per http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(int) when I was revising some String related Java concepts.

As per my knowledge, when we use method indexOf() for java.lang.String, the parameter is supposed to be char and hence I was assuming it to be

public int indexOf(char ch)

So, please explain me why is it public int indexOf(int ch).

Upvotes: 3

Views: 1476

Answers (2)

Muhammad
Muhammad

Reputation: 7324

Every char have an int value that you can use to get that char as well you can convert a chart into int the same way by assigning a char to an int variable try the following lines

char ch = 65;
System.out.println(ch);
int i = 'A';
System.out.println(i);

I'm using char values in a loop and it is allowed just because every char have an int value. try this code it will print out alphabets from A to Z and its equivalent int values

    for(char j = 'A'; j <= 'Z'; j++){
        System.out.println("int "+((int) j)+" = "+j);
    }

Upvotes: 1

bmargulies
bmargulies

Reputation: 100051

Unicode contains many more than 2^16 characters. Java 'char' and 'String' use a Unicode Transformation Format (UTF-16) to represent the full set of characters. Characters in the Base Multilingual Plane are represented as a single 16-bit 'char'. The rest are represented by a surrogate pair: two special 16-bit values from a set reserved for this purpose.

An alternative representation is UTF-32. In this representation, each character is a single 32-bit item, period.

For example, Cuneiform is out there in the SMP; the first character of the block is U+12000. In UTF-32, that's just 0x12000. In UTF-16, it's "\uD808\uDC00". Here's some pictures.

The Character and String classes (amongst others), provide a few methods that operate on UTF-32 characters for convenience. You're asking about one of them. Whenever you see 'int' as the datatype of a character, that's what the 'int' contains: a UTF-32 value. It's not hard to see how it can be more convenient to do some operations with a single UTF-32 value instead of a pair of surrogates.

Note that this has nothing to do with composed and non-composed accents. á can be represented in Unicode as either one or two UTF-16 characters, but there are no surrogates involved. All three of U+0061 (a), U+00E1 (a with precomposed accent), and U+0301 (composing acute accent) are ordinary BMP characters. So, even in UTF-32, you can have a two-item sequence: U+0061, U+0301.

The ICU4J library provides a more complete set of UTF-32 classes and methods.

Upvotes: 3

Related Questions