Jatel
Jatel

Reputation: 40

UTF to ASCII for comparison in JAVA

I have a list of strings and i want to compare it with "singleArgument" , i dont want it to be case sensitive so i made a method to make it lowerCase but also i dont want special characters to mess up comparison so if im looking for "ščž" singleArgument can be "scz"

case noCaseSensitive:
  final String patternSourceILike = (String) singleArgument;
  verdict = buildPattern(patternSourceILike.toLowerCase(Locale.ROOT))
    .matcher(((String) resolvedValue).toLowerCase(Locale.ROOT))
    .matches();
  break;

this i have for no case sensitive comparison.

If i convert string from utf8 to ascii and than compare it turns special characters to unknown characters.

Upvotes: 0

Views: 337

Answers (1)

No idea why you'd want to do this, since removing diacritics from letters makes them completely different letters, but you can use java.text.Normalizer for this: normalize the text to its canonical decomposition, then replace all "not ascii letters" with empty strings to strip out all (now separate) diacritics.

import java.text.Normalizer;

public class Test {
   public static void main(String []args) {
     String input = "\u0161\u010D\u017E"; // ščž
     String canonical = Normalizer.normalize(input,  Normalizer.Form.NFD);
     String ascii = canonical.replaceAll("\\W", "");
     String output = String.format("%s, %s", input, ascii);
     System.out.println(output); // "ščž, scz"
  }
}

Upvotes: 1

Related Questions