Pawel P.
Pawel P.

Reputation: 4089

Sorting String with non-western characters

I wanted to print sorted Polish names of all available languages.

import java.util.*;

public class Tmp
{
  public static void main(String... args)
  {
    Locale.setDefault(new Locale("pl","PL"));
    Locale[] locales = Locale.getAvailableLocales();
    ArrayList<String> langs = new ArrayList<String>();
    for(Locale loc: locales) {
      String  lng = loc.getDisplayLanguage();
      if(!lng.trim().equals("") && ! langs.contains(lng)){
        langs.add(lng);
      }
    }
    Collections.sort(langs);
    for(String str: langs){
      System.out.println(str);
    }
  }
}

Unfortunately I have issue with the sorting part. The output is:

:
:
kataloński
koreański
litewski
macedoński
:
:
węgierski
włoski
łotewski

Unfortunately in Polish ł comes after l and before m so the output should be:

:
:
kataloński
koreański
litewski
łotewski
macedoński
:
:
węgierski
włoski

How can I accomplish that? Is there an universal non-language-dependent method (say I now want to display this and sort in another language with another sorting rules).

Upvotes: 8

Views: 7512

Answers (6)

Steven.Nguyen
Steven.Nguyen

Reputation: 1044

Something like this

val polishCollator = yourCollection.sortedWith(Comparator { s1, s2 ->
            Collator.getInstance(Locale("pl", "PL")).compare(s1,s2)
        })

Upvotes: 1

sfive
sfive

Reputation: 11

I'am dealing with the same problem. I found that the local collector solution works fine for android 7.0, but does not on earlier android versions. I've implemented the following algorithm. It is pretty fast ( I sort more than 3000 strings) and does it on earlier android versions too.

public class SortBasedOnName implements Comparator {

    private Map<Character, Integer> myCharMap;
    private final static Map<Character, Integer>myPolCharTable = new HashMap<Character, Integer>();
    static {
        myPolCharTable.put(' ',0x0020);
        myPolCharTable.put('!',0x0021);
        myPolCharTable.put('"',0x0022);


        myPolCharTable.put('a',0x0040);
        myPolCharTable.put('ą',0x0041);
        myPolCharTable.put('b',0x0042);
        myPolCharTable.put('c',0x0043);
        myPolCharTable.put('ć',0x0044);


        myPolCharTable.put('{',0x0066);
        myPolCharTable.put('|',0x0067);
        myPolCharTable.put('}',0x0068);
    }

    public SortBasedOnName() {}

    public int compare(Object o1, Object o2) {

        Dictionary dd1 = (Dictionary) o1;
        Dictionary dd2 = (Dictionary) o2;

    return strCompareWithDiacritics(dd1.getOriginal(), dd2.getOriginal());
    }

    private  int strCompareWithDiacritics(String s1, String s2) {

        int i = 0;
        int result = 0;
        int length =0;

        s1 = s1.toLowerCase();
        s2 = s2.toLowerCase();
        if (s1.length() > s2.length()) {
            result = 1;
            length = s2.length();
        } else if (s1.length() < s2.length()) {
            result = -1;
            length = s1.length();
        } else if (s1.length() == s2.length()) {
            result = 0;
            length = s1.length();
        }

        try {
            while (i <length) {
                if (myPolCharTable.get(s1.charAt(i)) > myPolCharTable.get(s2.charAt(i))) {
                    result = 1;
                    break;
                } else if (myPolCharTable.get(s1.charAt(i)) < myPolCharTable.get(s2.charAt(i))) {
                    result = -1;
                    break;
                }
                i++;
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return result;
    }
}

Upvotes: 0

Evgeniy Dorofeev
Evgeniy Dorofeev

Reputation: 135992

try

Collections.sort(langs, Collator.getInstance(new Locale("pl", "PL")));

it will produce

...
litewski
łotewski
...

see Collator API for details

Upvotes: 11

Simon Dorociak
Simon Dorociak

Reputation: 33495

Unfortunately in Polish ł comes after l and before m so the output should be:

You can define your own Compararable or Comparator interface.

Or also this might help you:

Upvotes: 1

Dilum Ranatunga
Dilum Ranatunga

Reputation: 13374

Have a look at java.text.Collator.newInstance(Locale). You need to supply the Polish locale in your case. Collators implement the Comparator interface, so you can use that in sort APIs and in sorted datastructures like TreeSet.

Upvotes: 2

Joni
Joni

Reputation: 111219

You should pass a Collator to the sort method:

// sort according to default locale
Collections.sort(langs, Collator.getInstance());

The default sort order is defined by the Unicode codepoints in the string, and that's not the correct alphabetical order in any language.

Upvotes: 7

Related Questions