patrick
patrick

Reputation: 4852

Comparing two Strings in Scala using > and <

I'm trying to compare two Strings in Scala, trying to find out which comes first in the alphabet. I.e. a before b, aab before abb, etc.

It seems like the > and < operators are doing exactly this job:

scala> "b" > "a"

res14: Boolean = true

scala> "b" < "a"

res15: Boolean = false

scala> "aab" < "abb"

res16: Boolean = true

However, the doc did not enlighten me on what this really does, and none of the tutorials online seem to be using this method to compare strings. Thus, I was wondering:

scala> "?" > "!" res25: Boolean = true

is not intuitive to me.

Upvotes: 3

Views: 6274

Answers (2)

Alexey Romanov
Alexey Romanov

Reputation: 170713

How exactly does it work

> and other comparison operators delegate to compareTo method, whose exact behavior is documented at https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#compareTo(java.lang.String). E.g. x > y is the same as x.compareTo(y) > 0.

Is this a fail safe approach to do what I want to do, i.e. compare two Strings for alphabetical order?

No. It provides an order, but not the alphabetical one, if you take case (and diacritics, and...) into account. E.g. "b" > "A" is false.

For full alphabetical ordering, you probably want a Collator. (I'd also suggest reading https://english.stackexchange.com/a/212630/26340 to start appreciating how non-trivial the rules are even just for English).

Upvotes: 2

Shawn Mehan
Shawn Mehan

Reputation: 4568

As you saw in the docs, you can use the defined > (and all the others) for Strings. Ultimately, this breaks down the elements of the String into the underlying Chars and they are unicode. So you need to compare the unicode values for them to emulate the logic exhibited by the scalac.

Referring to a definitive list one finds that

U+0021 ! ! Exclamation mark

and

U+003F ? ? Question mark

so, indeed, !<? is true.

I believe that the comparison action is failsafe for any unicode sequence that you can supply. See even:

val string1 = "\u2200"
val string2 = "\uFB30"

string1 < string2

val string3 = string1 + string2
val string4 = string2 + string1

string3 < string4
string4 < string3

string3 == string1 + string2
string4 == string1 + string2

which in my worksheet gives me:

string1: String = ∀
string2: String = אּ // note: aleph comes after All in the unicode definition.

res7: Boolean = true

string3: String = ∀אּ
string4: String = אּ∀

res8: Boolean = true
res9: Boolean = false

res10: Boolean = true
res11: Boolean = false

Upvotes: 3

Related Questions