Reputation: 2089
I want to translate my Turkish strings to lowercase in both English and Turkish locale. I'm doing this:
String myString="YAŞAT BAYRI";
Locale trlocale= new Locale("tr-TR");
Locale enLocale = new Locale("en_US");
Log.v("mainlist", "en source: " +myString.toLowerCase(enLocale));
Log.v("mainlist", "tr source: " +myString.toLowerCase(trlocale));
The output is:
en source: yaşar bayri
tr source: yaşar bayri
But I want to have an output like this:
en source: yasar bayri
tr source: yaşar bayrı
Is this possible in Java?
Upvotes: 22
Views: 40621
Reputation: 51
you can do that:
Locale trlocale= new Locale("tr","TR");
The first parameter is your language, while the other one is your country.
Upvotes: 5
Reputation: 115388
Characters ş
and s
are different characters. Changing locale cannot help you to translate one to another. You have to create turkish-to-english characters table and do this yourself. I once did this for Vietnamic language that has a lot of such characters. You have to deal with 4 of 5, right? So, good luck!
Upvotes: 2
Reputation: 8942
If you are using the Locale
constructor, you can and must set the language, country and variant as separate arguments:
new Locale(language)
new Locale(language, country)
new Locale(language, country, variant)
Therefore, your test program creates locales with the language "tr-TR" and "en_US". For your test program, you can use new Locale("tr", "TR")
and new Locale("en", "US")
.
If you are using Java 1.7+, then you can also parse a language tag using Locale.forLanguageTag
:
String myString="YASAT BAYRI";
Locale trlocale= Locale.forLanguageTag("tr-TR");
Locale enLocale = Locale.forLanguageTag("en_US");
Creates strings that have the appropriate lower case for the language.
Upvotes: 44
Reputation: 1503090
I think this is the problem:
Locale trlocale= new Locale("tr-TR");
Try this instead:
Locale trlocale= new Locale("tr", "TR");
That's the constructor to use to specify country and language.
Upvotes: 11
Reputation: 109613
If you just want the string in ASCII, without accents, the following might do. First an accented character might be split in ASCII char and a combining diacritical mark (zero-width accent). Then only those accents may be removed by regular expression replace.
public static String withoutDiacritics(String s) {
// Decompose any ş into s and combining-,.
String s2 = Normalizer.normalize(s, Normalizer.Form.NFD);
return s2.replaceAll("(?s)\\p{InCombiningDiacriticalMarks}", "");
}
Upvotes: 3