Reputation: 46
I have a set of data in a MySQL database. I'm retrieving a list of a results ordered by a field called login. When I retrieve this set, two of the lines are in the following order.
[email protected]
[email protected]
However, when I compare them in Java (necessary because of the post-processing needed to merge objects application side), [email protected] has a value less than [email protected]. In other words, the String comparison would expect them to be in the following order.
[email protected]
[email protected]
As a whole, everything else is returned in the correct order. I assume the difference here is most likely a difference in the way Java and MySQL string comparison treats certain values. How do I get these to return in a consistent order (I'm ok with them being either way, just need the consistency).
Query I'm running:
select t0.id as envUserId , t0.environment_id as envId, t0.environment_name as envName, t0.customer_name as customerName, t0.version version, t0.user_id as userId, t0.login as userLogin, t0.sso_granted_roles as sso_granted_roles, t1z_.role_name as defaultRole, t3.id as customRoleId, t3.name as customRoleName
from environment_user t0
left join (
select distinct eu.id, eu.login
from environment_user eu
left join environment_user_role eur on eu.id = eur.environment_user_id
left join environment_user_custom_role eucr on eu.id = eucr.environment_user_id
left join custom_role cr on eucr.custom_role_id = cr.id
where eu.environment_id = '5a83069a-70d2-4d0e-9847-c709725281c5'
and (eur.role_name in ('Role1','Role2')
or cr.name in ('Role1','Role2'))
order by eu.login limit 0, 200) f on t0.id = f.id
left outer join environment_user_role t1z_ on t1z_.environment_user_id = t0.id
left outer join environment_user_custom_role ct1z_ on ct1z_.environment_user_id = t0.id
left outer join custom_role t3 on t3.id = ct1z_.custom_role_id
where t0.environment_id = '5a83069a-70d2-4d0e-9847-c709725281c5'
and t0.id = f.id
order by userLogin asc
What I'm getting back (extra lines above and below have been removed for clarity)
'c2ad9f82-e0d5-4f8d-a5fe-a2d72d901b98', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '649ea0bc-dab7-4ad2-a534-546f9817e252', '[email protected]', '0', 'Role1', NULL, NULL
'83313002-49a3-45f2-9013-e8dab15789d5', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '40d5c22a-33f8-4a37-a4db-63e3709cfae7', '[email protected]', '0', 'Role1', NULL, NULL
'5ba69c88-a773-4d5b-835d-c88688867d6a', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '91a7609a-4809-4e27-9d6f-448ff62b38b3', '[email protected]', '0', 'Role1', NULL, NULL
'6833a699-b5ca-46aa-8a53-23a6ef41e1f8', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '718808fa-3799-457f-9cdb-88ef887e0492', '[email protected]', '0', 'Role1', NULL, NULL
'c466c478-8a32-4926-9cde-06a40071ac85', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '6282739d-76ea-4dbb-be5e-b7d64d3b3f3f', '[email protected]', '0', 'Role1', NULL, NULL
'5b04d561-6c20-4703-aa96-f17eda0405b6', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', 'fb644427-46ab-42e4-8295-65a397409c0d', '[email protected]', '0', 'Role1', NULL, NULL
'27116bed-a1a6-483c-9e7b-97158786245c', '5a83069a-70d2-4d0e-9847-c709725281c5', 'SearchTestDomainEnv', 'SearchTestDomainCustomer', '1', '246f392d-6d27-402e-837f-98384da0abb6', '[email protected]', '0', 'Role1', NULL, NULL
Upvotes: 2
Views: 1450
Reputation: 46
Found a solution in this Stack Overflow question. It looks like Java's lexicographic sort is not based off of natural language sort as implemented in utf8_unicode_ci. The solution here is to create a Collator and use the compareTo method of the Collator to perform our sort instead.
Upvotes: 0
Reputation: 2652
What I have found out is that:
1) When I make query in MySQL, I get the result as follows:
2) Following Test.java
attempts to find the order of two Strings as follows:
import java.util.Arrays;
public class Test {
public static void main(String[] args) {
String[] arr={"[email protected]", "[email protected]"};
Arrays.sort(arr);
System.out.println(Arrays.toString(arr));
}
}
and the output is:
3) To see the collation, following query is executed:
SELECT table_catalog,
table_schema,
table_name,
column_name,
collation_name
FROM information_schema.columns
WHERE table_schema = 'test'
AND column_name = 'email';
And the output is:
So we see that both MySQL and Java sort the two Strings in the same order when MySQL collation is utf8_general_ci
.
Upvotes: 3
Reputation: 331
See the comment
on String.compareTo()
/**
* Compares two strings lexicographically.
* The comparison is based on the Unicode value of each character in
* the strings. The character sequence represented by this
* {@code String} object is compared lexicographically to the
* character sequence represented by the argument string. The result is
* a negative integer if this {@code String} object
* lexicographically precedes the argument string. The result is a
* positive integer if this {@code String} object lexicographically
* follows the argument string. The result is zero if the strings
* are equal; {@code compareTo} returns {@code 0} exactly when
* the {@link #equals(Object)} method would return {@code true}.
* <p>
* This is the definition of lexicographic ordering. If two strings are
* different, then either they have different characters at some index
* that is a valid index for both strings, or their lengths are different,
* or both. If they have different characters at one or more index
* positions, let <i>k</i> be the smallest such index; then the string
* whose character at position <i>k</i> has the smaller value, as
* determined by using the < operator, lexicographically precedes the
* other string. In this case, {@code compareTo} returns the
* difference of the two character values at position {@code k} in
* the two string -- that is, the value:
* <blockquote><pre>
* this.charAt(k)-anotherString.charAt(k)
* </pre></blockquote>
* If there is no index position at which they differ, then the shorter
* string lexicographically precedes the longer string. In this case,
* {@code compareTo} returns the difference of the lengths of the
* strings -- that is, the value:
* <blockquote><pre>
* this.length()-anotherString.length()
* </pre></blockquote>
*
* @param anotherString the {@code String} to be compared.
* @return the value {@code 0} if the argument string is equal to
* this string; a value less than {@code 0} if this string
* is lexicographically less than the string argument; and a
* value greater than {@code 0} if this string is
* lexicographically greater than the string argument.
*/
public int compareTo(String anotherString) {
int len1 = value.length;
int len2 = anotherString.value.length;
int lim = Math.min(len1, len2);
char v1[] = value;
char v2[] = anotherString.value;
int k = 0;
while (k < lim) {
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
return c1 - c2;
}
k++;
}
return len1 - len2;
}
MySql do in similar way,but depend on the character set
of DB.
More information here: MYSQL
Upvotes: 0