Reputation: 4852
I'm trying to compare two Strings in Scala, trying to find out which comes first in the alphabet. I.e. a before b, aab before abb, etc.
It seems like the >
and <
operators are doing exactly this job:
scala> "b" > "a"
res14: Boolean = true
scala> "b" < "a"
res15: Boolean = false
scala> "aab" < "abb"
res16: Boolean = true
However, the doc did not enlighten me on what this really does, and none of the tutorials online seem to be using this method to compare strings. Thus, I was wondering:
Is this a fail safe approach to do what I want to do, i.e. compare two Strings for alphabetical order?
Is there a more common approach in Scala that I am missing here?
How exactly does it work - e.g.
scala> "?" > "!" res25: Boolean = true
is not intuitive to me.
Upvotes: 3
Views: 6274
Reputation: 170713
How exactly does it work
>
and other comparison operators delegate to compareTo
method, whose exact behavior is documented at https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#compareTo(java.lang.String). E.g. x > y
is the same as x.compareTo(y) > 0
.
Is this a fail safe approach to do what I want to do, i.e. compare two Strings for alphabetical order?
No. It provides an order, but not the alphabetical one, if you take case (and diacritics, and...) into account. E.g. "b" > "A"
is false.
For full alphabetical ordering, you probably want a Collator
.
(I'd also suggest reading https://english.stackexchange.com/a/212630/26340 to start appreciating how non-trivial the rules are even just for English).
Upvotes: 2
Reputation: 4568
As you saw in the docs, you can use the defined >
(and all the others) for String
s. Ultimately, this breaks down the elements of the String
into the underlying Char
s and they are unicode. So you need to compare the unicode values for them to emulate the logic exhibited by the scalac.
Referring to a definitive list one finds that
U+0021 ! ! Exclamation mark
and
U+003F ? ? Question mark
so, indeed, !<?
is true
.
I believe that the comparison action is failsafe for any unicode sequence that you can supply. See even:
val string1 = "\u2200"
val string2 = "\uFB30"
string1 < string2
val string3 = string1 + string2
val string4 = string2 + string1
string3 < string4
string4 < string3
string3 == string1 + string2
string4 == string1 + string2
which in my worksheet gives me:
string1: String = ∀
string2: String = אּ // note: aleph comes after All in the unicode definition.
res7: Boolean = true
string3: String = ∀אּ
string4: String = אּ∀
res8: Boolean = true
res9: Boolean = false
res10: Boolean = true
res11: Boolean = false
Upvotes: 3