Reputation: 1
If the name is typed for example- "ОХ699" using a different keyboard. as a result, “OX” is flagged as non-English characters, even though they appear to be English characters. so is there any way to convert the characters like these to English characters directly?
i tried following code to convert "OX" to english alphabets "OX":
String subjectString = "ОХ699";
subjectString = Normalizer.normalize(subjectString, Normalizer.Form.NFD);
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
but it is not converting to english alphabets. Showing output : "699" Expected output : "OX699"
Upvotes: -1
Views: 654
Reputation: 758
It is not possible using standard lib. You have to implement your own translations. Someone want to translate Р (R in Cyrillic) to p, and someone wants r. Also there is a problem with Chinese characters or emojis.
There is a linux program uni2ascii
that do exactly what you want - you can see how it is implemented in other apps https://salsa.debian.org/debian/uni2ascii/-/blob/master/uni2ascii.c (see the extremely big switch
statements).
There is also Python clone of this app, but very, very simplified - https://github.com/ajanin/uni2ascii/blob/master/uni2ascii/__init__.py#L65 . You can copy that stwich and implement translation in your app.
Or install the uni2ascii
on the server and just call it (or call it using jni).
But any way - the common practice is just to ignore and skip non-ascii chars
EDIT: I found java implementation in Lucene engine - https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Analysis.Common/Analysis/Miscellaneous/ASCIIFoldingFilter.cs
Upvotes: 1