Reputation: 841
The compareTo() method in Java compares two strings "lexicographically". Can someone please simply explain how the lexicographic comparison works in java?
I found this post that explains the three cases of <0 , ==0, and >0 ; However, I am still confused...
Does this mean that the int returned is the number of places away the strings are form one another if they were to be sorted alphabetically like a dictionary?
Also, how does the method deal with case sensitivity? Are lower case letters first in line before uppercase? Is there a chart for this?
For example, the below code produces an output of -31. Does this mean that the string Dog is -31 places away from the string cat?
public static void main(String[] args) {
Scanner keyboard = new Scanner(System.in);
String str1 = "Dog";
String str2 = "cat";
int result = str1.compareTo(str2);
System.out.println(result);
Upvotes: 5
Views: 14777
Reputation: 9334
I found Wikipedia's Definition of Lexicographical order very useful in answering your question.
In a simplistic manner, the comparison is a numeric result from doing an alphabetic comparison. In alphabetic comparison, we compare the ordered set of letters that make up a sequence (usually words or strings). The return value will be 0 if the two are equal, and < or > depending on which value is alphabetically before or after the other.
take a list of words:
If we compare these, we take the first character of each and look. When we compare 'cat' and 'dog', we take the first character 'c' and 'd' and compare them. Numerically in code, the easy(not necessarily best) way to do this is to convert them to a numeric value and subtract one value from the other. This will equal 0 if they are the same, and we'll go to comparing the next character in each. If they're different, then we know that one is lexicographically (alphabetically) after the other.
The return value is not required to have any insightful information. That's why the only values that mean anything are <0 , ==0, and >0.
With regards to casing, that is an implementation detail - There are comparators that will consider an upper case 'A' to be the same as a lower case 'a', and there are comparators that don't, since they have different numerical values. (See: How to sort alphabetically while ignoring case sensitive?).
Upvotes: 1
Reputation: 21004
The value returned does not really matter as the compareTo
contract is to return negative, positive or 0 (as you already know).
However, if really you want to understand why -31
is returned when comparing Dog
with cat
(or any other string) then you could simply look at the method directly in String
class :
public int compareTo(String anotherString) {
int len1 = value.length;
int len2 = anotherString.value.length;
int lim = Math.min(len1, len2);
char v1[] = value;
char v2[] = anotherString.value;
int k = 0;
while (k < lim) {
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
return c1 - c2;
}
k++;
}
return len1 - len2;
}
Keep in mind that value
is the char
array backing the string.
private final char value[];
So how does this method proceed ?
lim
.char
can be represented as int
value (which take their ascii value) and are already ordered. Thus when substracting a negative number will be returned if the second char is "higher" then the first one. A positive will be returned if the second char is "lower" then the first one. 0 will be returned if both are equals. In your example, first letter of both words are not equals so you get to compare D
with c
which are respectively represented as 68
and 99
. Substract 99 to 68 and you get -31.
So to answer this question :
Does this mean that the int returned is the number of places away the strings are form one another if they were to be sorted alphabetically like a dictionary?
No, it is actually either the difference between two non-matching char's ascii value or the difference of both length.
Also, how does the method deal with case sensitivity? Are lower case letters first in line before uppercase? Is there a chart for this?
If you want to ignore the case when comparing, you can use String#compareToIgnoreCase
.
Also you can check this chart for ascii values (upper and lower case).
Upvotes: 5