user1552545
user1552545

Reputation: 1331

Oracle equivalent sort in Java

I want to sort Java strings in alphabetical order. The sort should work similar to Oracle SQL order by. I've tried to use the Java Collator, but it prioritizes the small letters over the large letters. There are problems with the non English letters too...

For example:

select * from TABLE1 order by COLUMN1;

Returns the strings in the followin order: A, a, Á, á, Ä, ä, B, b, C, C (it is correct for me)

Collections.sort(strings, Collator.getInstance());

Orders the strings like this: a, A, á, ä, Ä, Á, b, B, C, C (there is a problem with the á, ä, Ä, Á order)

(The locale is the same in both cases)

I don't want to type the whole alphabet, because I will propably forget some special letter. Out application will be used by many different people from many european countries.

Upvotes: 5

Views: 2685

Answers (3)

Brave Soul
Brave Soul

Reputation: 3620

As per my understanding This will help you

  Collator coll = Collator.getInstance(locale);
  coll.setStrength(Collator.PRIMARY) 
  Collections.sort(words, coll);

OR you can try like this

 List<String> words = Arrays.asList(
      "Äbc", "äbc", "Àbc", "àbc", "Abc", "abc", "ABC"
    );

    log("Different 'Collation Strength' values give different sort results: ");
    log(words + " - Original Data");
    sort(words, Strength.Primary);
    sort(words, Strength.Secondary);
    sort(words, Strength.Tertiary);

    private enum Strength {
    Primary(Collator.PRIMARY), //base char
    Secondary(Collator.SECONDARY), //base char + accent
    Tertiary(Collator.TERTIARY), // base char + accent + case
    Identical(Collator.IDENTICAL); //base char + accent + case + bits

    int getStrength() { return fStrength; }

    private int fStrength;
    private Strength(int aStrength){
      fStrength = aStrength;
    }
  }

  private static void sort(List<String> aWords, Strength aStrength){
    Collator collator = Collator.getInstance(TEST_LOCALE);
    collator.setStrength(aStrength.getStrength());
    Collections.sort(aWords, collator);
    log(aWords.toString() + " " + aStrength);
  }

This class outputs the following:

Different 'Collation Strength' values give different sort results: 
[Äbc, äbc, Àbc, àbc, Abc, abc, ABC] - Original Data
[Äbc, äbc, Àbc, àbc, Abc, abc, ABC] Primary
[Abc, abc, ABC, Àbc, àbc, Äbc, äbc] Secondary
[abc, Abc, ABC, àbc, Àbc, äbc, Äbc] Tertiary

Reference got from

Comparator

Localized ordering

Upvotes: 1

flup
flup

Reputation: 27104

Sorting is complicated. The Oracle documentation gives a complete overview of the different aspects.

It'd be nice to know the exact sorting you're trying to reproduce, i.e. the exact value of NLS_SORT. You can find out by executing

SELECT SYS_CONTEXT ('USERENV', 'NLS_SORT') from SYS.DUAL;

The sort you're using produces

A, a, Á, á, Ä, ä, B, b, C, c

It's not clear what the order of input is.

  • It puts A before a. This is odd. I am inferring it's not actually preferring A over a but considers them equal, i.e. is case-insensitive.
  • It puts un-accented letters before letters with accents so I'm inferring it's accent-sensitive.

An NLS_SORT of GENERIC_M_CI fits the bill. You can check by running it in oracle:

[...] ORDER BY NLSSORT(<colname>, 'NLS_SORT=GENERIC_M_CI');

A Java Collator has a setStrength() method which accepts values PRIMARY, SECONDARY, TERTIARY and IDENTICAL.

The exact interpretation depends on the locale but the javadocs give as an example

  • The primary strength distinguishes between a and b only.
  • The secondary strength also distinguishes between a and á.
  • The tertiary strength also distinguishes between a and A.
  • The identical strength is only satisfied if the characters are absolutely identical.

So a Collator with strength SECONDARY should serve you fine.

On my machine, with en_US default locale, I tried this out:

List<String> strings = Arrays.asList("A", "Ä", "Á", "B", "C", "a", "á", "ä", "b", "c");
Collator collator = Collator.getInstance();
collator.setStrength(Collator.SECONDARY);
Collections.sort(strings, collator);
System.out.println(strings);

Prints

[A, a, Á, á, Ä, ä, B, b, C, c]

(But if you'd put the a before the A, it'd have left that order untouched.)

Upvotes: 6

ProgrammersBlock
ProgrammersBlock

Reputation: 6284

1)You'll need one string that represents the order of the Oracle sort. I'm calling this string oracleSort. You could try to search the internet for this or you could insert one row of every letter into the database, query that column and get the results back. From the results assemble your sort string. It sounds labor intensive but you could also use a Java program to fill the database.

oracleSort = "AaÁáÄäBbCc..."

2)I think you would need to implement a Comparator that compares two strings. http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Comparator.html Go character by character through the strings and compare the indices of the characters as found in oracleSort. The first character with the lesser index than its corresponding position is the lesser string.

oracleSort.indexOf("a") is 1.

oracleSort.indexOf("Á") is 2.

"a" is less than "Á"

3)Later I was thinking there might be an alternative to the pattern. It looks like the order is Grouped by letter without accents, then groups sorted by ASCII uppercase, then uppercase before lowercase.

Therefore, you could use Apache commons-lang StringUtils.stripAccents to make string copies without accents and uppercase. If those are equal, the compare copies with accents but uppercase. If those are equal, then check each character to see if one is uppercase and one is lowercase.

public static int compare(String one, String two)
{
    String oneNoAccent = StringUtils.stripAccents(one).toUpperCase();
    String twoNoAccent = StringUtils.stripAccents(two).toUpperCase();
    int compare = oneNoAccent.compareTo(twoNoAccent);
    if(compare == 0)
    {
        String oneU = one.toUpperCase();
        String twoU = two.toUpperCase();
        compare = oneU.compareTo(twoU);
        if(compare == 0)
        {
            //TODO:
        }
    }
    return compare;
}

Upvotes: 0

Related Questions