Dmytro
Dmytro

Reputation: 2319

Regex to replace All turkish symbols to regular latin symbols

I have a class that replaces all turkish symbols to similar latin symbols and pass the result to searcher.

these are the methods for symbol replacement

@Override
String replaceTurkish(String words) {

        if (checkWithRegExp(words)) {
            return words.toLowerCase().replaceAll("ç", "c").replaceAll("ğ", "g").replaceAll("ı", "i").
                    replaceAll("ö", "o").replaceAll("ş", "s").replaceAll("ü", "u");
        } else return words;
    }

    public static boolean checkWithRegExp(String word){
        Pattern p = Pattern.compile("[öçğışü]");
        Matcher m = p.matcher(word);
        return m.matches();
    }

But this always return unmodified words statement.

What am I doing wrong?

Thanks in advance!

Upvotes: 2

Views: 4415

Answers (1)

Jeutnarg
Jeutnarg

Reputation: 1178

Per the Java 7 api, Matcher.matches()

Attempts to match the entire region against the pattern.

Your pattern is "[öçğışü]", which regex101.com (an awesome resource) says will match

a single character in the list öçğışü literally

Perhaps you may see the problem already. Your regex is not going to match anything except a single Turkish character, since you are attempting to match the entire region against a regex which will only ever accept one character.

I recommend either using find(), per suggestion by Andreas in the comments, or using a regex like this:

".*[öçğışü].*"

which should actually find words which contains any Turkish-specific characters.

Additionally, I'll point out that regex is case-sensitive, so if there are upper-case variants of these letters, you should include those as well and modify your replace statements.

Finally (edit): you can make your Pattern case-insensitive, but your replaceAll's will still need to change to be case-insensitive. I am unsure of how this will work with non-Latin characters, so you should test that flag before relying on it.

Pattern p = Pattern.compile(".*[öçğışü].*", Pattern.CASE_INSENSITIVE);

Upvotes: 6

Related Questions