user1323771
user1323771

Reputation: 111

How to detect an appropriate String locale in Java

In current project I need to lowercase the incoming text, which can be passed in English / German / Turkish languages. Ordinary String#toLowerCase() fails for some characters of the Turkish alphabet because, for example, it is necessary to map non-ASCII character http://unicode-table.com/en/0130/ to ASCII http://unicode-table.com/en/0069/. Java 7 handles this mapping without any issues in case I provide the locale, ie. str.toLowerCase(new Locale(“tr”)) is necessary. But this case it looks I should to detect the appropriate locale of given text, because it could be written on one of three possible languages.

Is there any way to perform the appropriate locale detection or is this way wrong?

EDIT 1

I didn't mention the actual use case, I'm adding tags to the entity via the REST API and I guess I'm not allowed to change the API contract..

Upvotes: 0

Views: 1400

Answers (2)

AlexR
AlexR

Reputation: 115328

Probably there is a library that does this but I don't know such library. I can however offer you a simple solution.

There are several special characters in Turkish and German language. All other characters are plain English and therefore the problem is irrelevant for them. So, you can hold a list of special German and Turkish characters and detect the locale of current string by searching of these characters into the string. If one of Turkish characters is found in string consider it to be processed in Turkish locale, the same is for German. If no-one of special characters is found, use default locale.

This solution has some performance penalties because you are going to scan the string twice but this is not important for most applications.

Upvotes: 1

wero
wero

Reputation: 32980

There are libraries which use heuristics to detect a language with a certain probability. An example can be found here.

Upvotes: 1

Related Questions