Reputation: 1331
I want to sort Java strings in alphabetical order. The sort should work similar to Oracle SQL order by. I've tried to use the Java Collator, but it prioritizes the small letters over the large letters. There are problems with the non English letters too...
For example:
select * from TABLE1 order by COLUMN1;
Returns the strings in the followin order: A, a, Á, á, Ä, ä, B, b, C, C (it is correct for me)
Collections.sort(strings, Collator.getInstance());
Orders the strings like this: a, A, á, ä, Ä, Á, b, B, C, C (there is a problem with the á, ä, Ä, Á order)
(The locale is the same in both cases)
I don't want to type the whole alphabet, because I will propably forget some special letter. Out application will be used by many different people from many european countries.
Upvotes: 5
Views: 2685
Reputation: 3620
As per my understanding This will help you
Collator coll = Collator.getInstance(locale);
coll.setStrength(Collator.PRIMARY)
Collections.sort(words, coll);
OR you can try like this
List<String> words = Arrays.asList(
"Äbc", "äbc", "Àbc", "àbc", "Abc", "abc", "ABC"
);
log("Different 'Collation Strength' values give different sort results: ");
log(words + " - Original Data");
sort(words, Strength.Primary);
sort(words, Strength.Secondary);
sort(words, Strength.Tertiary);
private enum Strength {
Primary(Collator.PRIMARY), //base char
Secondary(Collator.SECONDARY), //base char + accent
Tertiary(Collator.TERTIARY), // base char + accent + case
Identical(Collator.IDENTICAL); //base char + accent + case + bits
int getStrength() { return fStrength; }
private int fStrength;
private Strength(int aStrength){
fStrength = aStrength;
}
}
private static void sort(List<String> aWords, Strength aStrength){
Collator collator = Collator.getInstance(TEST_LOCALE);
collator.setStrength(aStrength.getStrength());
Collections.sort(aWords, collator);
log(aWords.toString() + " " + aStrength);
}
This class outputs the following:
Different 'Collation Strength' values give different sort results:
[Äbc, äbc, Àbc, àbc, Abc, abc, ABC] - Original Data
[Äbc, äbc, Àbc, àbc, Abc, abc, ABC] Primary
[Abc, abc, ABC, Àbc, àbc, Äbc, äbc] Secondary
[abc, Abc, ABC, àbc, Àbc, äbc, Äbc] Tertiary
Reference got from
Upvotes: 1
Reputation: 27104
Sorting is complicated. The Oracle documentation gives a complete overview of the different aspects.
It'd be nice to know the exact sorting you're trying to reproduce, i.e. the exact value of NLS_SORT
. You can find out by executing
SELECT SYS_CONTEXT ('USERENV', 'NLS_SORT') from SYS.DUAL;
The sort you're using produces
A, a, Á, á, Ä, ä, B, b, C, c
It's not clear what the order of input is.
A
before a
. This is odd. I am inferring it's not actually preferring A over a but considers them equal, i.e. is case-insensitive.An NLS_SORT
of GENERIC_M_CI
fits the bill. You can check by running it in oracle:
[...] ORDER BY NLSSORT(<colname>, 'NLS_SORT=GENERIC_M_CI');
A Java Collator has a setStrength()
method which accepts values PRIMARY
, SECONDARY
, TERTIARY
and IDENTICAL
.
The exact interpretation depends on the locale but the javadocs give as an example
a
and b
only.a
and á
.a
and A
.So a Collator with strength SECONDARY should serve you fine.
On my machine, with en_US default locale, I tried this out:
List<String> strings = Arrays.asList("A", "Ä", "Á", "B", "C", "a", "á", "ä", "b", "c");
Collator collator = Collator.getInstance();
collator.setStrength(Collator.SECONDARY);
Collections.sort(strings, collator);
System.out.println(strings);
Prints
[A, a, Á, á, Ä, ä, B, b, C, c]
(But if you'd put the a
before the A
, it'd have left that order untouched.)
Upvotes: 6
Reputation: 6284
1)You'll need one string that represents the order of the Oracle sort. I'm calling this string oracleSort. You could try to search the internet for this or you could insert one row of every letter into the database, query that column and get the results back. From the results assemble your sort string. It sounds labor intensive but you could also use a Java program to fill the database.
oracleSort = "AaÁáÄäBbCc..."
2)I think you would need to implement a Comparator that compares two strings. http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Comparator.html Go character by character through the strings and compare the indices of the characters as found in oracleSort. The first character with the lesser index than its corresponding position is the lesser string.
oracleSort.indexOf("a") is 1.
oracleSort.indexOf("Á") is 2.
"a" is less than "Á"
3)Later I was thinking there might be an alternative to the pattern. It looks like the order is Grouped by letter without accents, then groups sorted by ASCII uppercase, then uppercase before lowercase.
Therefore, you could use Apache commons-lang StringUtils.stripAccents to make string copies without accents and uppercase. If those are equal, the compare copies with accents but uppercase. If those are equal, then check each character to see if one is uppercase and one is lowercase.
public static int compare(String one, String two)
{
String oneNoAccent = StringUtils.stripAccents(one).toUpperCase();
String twoNoAccent = StringUtils.stripAccents(two).toUpperCase();
int compare = oneNoAccent.compareTo(twoNoAccent);
if(compare == 0)
{
String oneU = one.toUpperCase();
String twoU = two.toUpperCase();
compare = oneU.compareTo(twoU);
if(compare == 0)
{
//TODO:
}
}
return compare;
}
Upvotes: 0