ccheneson
ccheneson

Reputation: 49410

Sort a list of hungarian strings in the hungarian alphabetical order

I am working at the moment with some data in hungarians. I have to sort a list of hungarians strings.

According to this Collation Sequence page

Hungarian alphabetic order is: A=Á, B, C, CS, D, DZ, DZS, E=É, F, G, GY, H, I=Í, J, K, L, LY, M, N, NY, O=Ó, Ö=Ő, P, Q, R, S, SZ, T, TY, U=Ú, Ü=Ű, V, W, X, Y, Z, ZS

So vowels are treated the same (A=Á, ...) so in the result you can have some like that using Collator :

Abdffg
Ádsdfgsd
Aegfghhrf

Up to here, no problem :)

But now, I have the requirement to sort according to the Hungarian alphabet

A Á B C Cs D Dz Dzs E É F G Gy H I Í J K L Ly M N Ny O Ó Ö Ő P (Q) R S Sz T Ty U Ú Ü Ű V (W) (X) (Y) Z Zs

A is considered different than Á

Playing with the Strength from Collator doesnt change the order in the output. A and Á are still mixed up.

Is there any librairies/tricks to sort a list of string according to the hungarian alphabetical order?

So far what I am doing is :

This looks too much hassle for the task no?

List<String> words = Arrays.asList(
        "Árfolyam", "Az",
        "Állásajánlatok","Adminisztráció",
        "Zsfgsdgsdfg", "Qdfasfas"

);

final Map<String, Integer> map = new HashMap<String, Integer>();
      map.put("A",0);
      map.put("Á",1);
      map.put("E",2);
      map.put("É",3);

      map.put("O",4);
      map.put("Ó",5);
      map.put("Ö",6);
      map.put("Ő",7);

      map.put("U",8);
      map.put("Ú",9);
      map.put("Ü",10);
      map.put("Ű",11);


      final Collator c = Collator.getInstance(new Locale("hu"));
      c.setStrength(Collator.TERTIARY);
      Collections.sort(words, c);

      Collections.sort(words, new Comparator<String>(){
          public int compare(String s1, String s2) {

              int f = c.compare(s1,s2);
              if (f == 0) return 0;

              String a = Character.toString(s1.charAt(0));
              String b = Character.toString(s2.charAt(0));

              if (map.get(a) != null && map.get(b) != null) {
                  if (map.get(a) < map.get(b)) {
                      return -1;
                  }
                  else if (map.get(a) == map.get(b)) {
                      return 0;
                  }
                  else {
                      return 1;
                  }
              }


              return 0;
          }
      });

Thanks for your input

Upvotes: 14

Views: 5099

Answers (4)

Greg
Greg

Reputation: 471

By stream you can sort like below:

public List<String> sortBy(List<String> sortable) {

  Collator coll = Collator.getInstance(new Locale("hu","HU"));

  return sortable.stream()
                 .sorted(Comparator.comparing(s -> s, coll))
                 .collect(Collectors.toList());
}

Upvotes: 2

lsolova
lsolova

Reputation: 106

I found a good idea, you can use a RuleBasedCollator.

Source: http://download.oracle.com/javase/tutorial/i18n/text/rule.html

And here is the Hungarian rule:

 < a,A < á,Á < b,B < c,C < cs,Cs,CS < d,D < dz,Dz,DZ < dzs,Dzs,DZS 
 < e,E < é,É < f,F < g,G < gy,Gy,GY < h,H < i,I < í,Í < j,J
 < k,K < l,L < ly,Ly,LY < m,M < n,N < ny,Ny,NY < o,O < ó,Ó 
 < ö,Ö < ő,Ő < p,P < q,Q < r,R < s,S < sz,Sz,SZ < t,T 
 < ty,Ty,TY < u,U < ú,Ú < ü,Ü < ű,Ű < v,V < w,W < x,X < y,Y < z,Z < zs,Zs,ZS

Upvotes: 9

petIQe
petIQe

Reputation: 33

Will any of the solutions result in ordering the strings (names) 'Czár' and 'Csóka' as Czár, Csóka? This would be the correct order, since CS in Csóka is considered one letter and is after C. However, recognizing double-character consonants is impossible even with a list of all Hungarian words, since there might be cases, where two words could look exactly the same character by character, but in one there are two consonants together, while in the other there are two characters reprezenting one letter at the very same place.

Upvotes: 1

Russell Shingleton
Russell Shingleton

Reputation: 3196

Change the order of your map.

Put the numeric representation as the key and the letter as the value. This will allow you to use a TreeMap which will be sorted by key.

You can then just do map.get(1) and it will return the first letter of the alphabet.

Upvotes: 0

Related Questions