kfan
kfan

Reputation: 46

MySQL ORDER BY comparison vs Java compareTo comparison

I have a set of data in a MySQL database. I'm retrieving a list of a results ordered by a field called login. When I retrieve this set, two of the lines are in the following order.

[email protected]
[email protected]

However, when I compare them in Java (necessary because of the post-processing needed to merge objects application side), [email protected] has a value less than [email protected]. In other words, the String comparison would expect them to be in the following order.

[email protected]
[email protected]

As a whole, everything else is returned in the correct order. I assume the difference here is most likely a difference in the way Java and MySQL string comparison treats certain values. How do I get these to return in a consistent order (I'm ok with them being either way, just need the consistency).

Query I'm running:

select t0.id as envUserId , t0.environment_id as envId, t0.environment_name as envName,   t0.customer_name as customerName, t0.version version, t0.user_id as userId, t0.login as userLogin,   t0.sso_granted_roles as sso_granted_roles,   t1z_.role_name as defaultRole, t3.id as customRoleId, t3.name as customRoleName
  from environment_user t0   
  left join ( 
    select distinct eu.id, eu.login       
    from  environment_user eu                 
    left join environment_user_role eur on eu.id = eur.environment_user_id   
    left join environment_user_custom_role eucr on eu.id = eucr.environment_user_id   
    left join custom_role cr on eucr.custom_role_id = cr.id        
    where eu.environment_id = '5a83069a-70d2-4d0e-9847-c709725281c5'             
    and (eur.role_name in ('Role1','Role2') 
        or cr.name in ('Role1','Role2'))       
    order by eu.login limit 0, 200) f on t0.id = f.id   
  left outer join environment_user_role t1z_ on t1z_.environment_user_id = t0.id   
  left outer join environment_user_custom_role ct1z_ on ct1z_.environment_user_id = t0.id   
  left outer join custom_role t3 on t3.id = ct1z_.custom_role_id   
  where t0.environment_id = '5a83069a-70d2-4d0e-9847-c709725281c5'     
  and t0.id = f.id
  order by userLogin asc

What I'm getting back (extra lines above and below have been removed for clarity)

'c2ad9f82-e0d5-4f8d-a5fe-a2d72d901b98', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '649ea0bc-dab7-4ad2-a534-546f9817e252', '[email protected]', '0', 'Role1', NULL, NULL
'83313002-49a3-45f2-9013-e8dab15789d5', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '40d5c22a-33f8-4a37-a4db-63e3709cfae7', '[email protected]', '0', 'Role1', NULL, NULL
'5ba69c88-a773-4d5b-835d-c88688867d6a', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '91a7609a-4809-4e27-9d6f-448ff62b38b3', '[email protected]', '0', 'Role1', NULL, NULL
'6833a699-b5ca-46aa-8a53-23a6ef41e1f8', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '718808fa-3799-457f-9cdb-88ef887e0492', '[email protected]', '0', 'Role1', NULL, NULL
'c466c478-8a32-4926-9cde-06a40071ac85', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '6282739d-76ea-4dbb-be5e-b7d64d3b3f3f', '[email protected]', '0', 'Role1', NULL, NULL
'5b04d561-6c20-4703-aa96-f17eda0405b6', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', 'fb644427-46ab-42e4-8295-65a397409c0d', '[email protected]', '0', 'Role1', NULL, NULL
'27116bed-a1a6-483c-9e7b-97158786245c', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '246f392d-6d27-402e-837f-98384da0abb6', '[email protected]', '0', 'Role1', NULL, NULL

Upvotes: 2

Views: 1450

Answers (3)

kfan
kfan

Reputation: 46

Found a solution in this Stack Overflow question. It looks like Java's lexicographic sort is not based off of natural language sort as implemented in utf8_unicode_ci. The solution here is to create a Collator and use the compareTo method of the Collator to perform our sort instead.

Upvotes: 0

Sanjeev Saha
Sanjeev Saha

Reputation: 2652

What I have found out is that:

1) When I make query in MySQL, I get the result as follows:

enter image description here

2) Following Test.java attempts to find the order of two Strings as follows:

import java.util.Arrays;

public class Test {
  public static void main(String[] args) {   
      String[] arr={"[email protected]", "[email protected]"};
      Arrays.sort(arr);   
      System.out.println(Arrays.toString(arr));
  }
}

and the output is:

enter image description here

3) To see the collation, following query is executed:

SELECT table_catalog,
       table_schema,
       table_name,
       column_name,
       collation_name
FROM   information_schema.columns
WHERE  table_schema = 'test'
       AND column_name = 'email'; 

And the output is:

enter image description here

So we see that both MySQL and Java sort the two Strings in the same order when MySQL collation is utf8_general_ci.

Upvotes: 3

star
star

Reputation: 331

See the comment on String.compareTo()

/**
     * Compares two strings lexicographically.
     * The comparison is based on the Unicode value of each character in
     * the strings. The character sequence represented by this
     * {@code String} object is compared lexicographically to the
     * character sequence represented by the argument string. The result is
     * a negative integer if this {@code String} object
     * lexicographically precedes the argument string. The result is a
     * positive integer if this {@code String} object lexicographically
     * follows the argument string. The result is zero if the strings
     * are equal; {@code compareTo} returns {@code 0} exactly when
     * the {@link #equals(Object)} method would return {@code true}.
     * <p>
     * This is the definition of lexicographic ordering. If two strings are
     * different, then either they have different characters at some index
     * that is a valid index for both strings, or their lengths are different,
     * or both. If they have different characters at one or more index
     * positions, let <i>k</i> be the smallest such index; then the string
     * whose character at position <i>k</i> has the smaller value, as
     * determined by using the &lt; operator, lexicographically precedes the
     * other string. In this case, {@code compareTo} returns the
     * difference of the two character values at position {@code k} in
     * the two string -- that is, the value:
     * <blockquote><pre>
     * this.charAt(k)-anotherString.charAt(k)
     * </pre></blockquote>
     * If there is no index position at which they differ, then the shorter
     * string lexicographically precedes the longer string. In this case,
     * {@code compareTo} returns the difference of the lengths of the
     * strings -- that is, the value:
     * <blockquote><pre>
     * this.length()-anotherString.length()
     * </pre></blockquote>
     *
     * @param   anotherString   the {@code String} to be compared.
     * @return  the value {@code 0} if the argument string is equal to
     *          this string; a value less than {@code 0} if this string
     *          is lexicographically less than the string argument; and a
     *          value greater than {@code 0} if this string is
     *          lexicographically greater than the string argument.
     */
    public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

MySql do in similar way,but depend on the character set of DB. More information here: MYSQL

Upvotes: 0

Related Questions