Duncan Krebs
Duncan Krebs

Reputation: 3502

Cyrillic alphabet validation

I came across an interesting defect today the issue is I have a deployment of my web application in Russia and the name value "Наталья" is not returning true as alphaNumeric in the method below. Curious for some input on how people would approach a problem like this! - Duncan

private boolean isAlphaNumeric(String str) {
    return str.matches("[\\w-']+");
}

Upvotes: 13

Views: 17219

Answers (2)

Ilya Serbis
Ilya Serbis

Reputation: 22303

In my case I have to check whether it's a name written in Russian.

I've ended up with this:

private static final String ruNameRegEx = "[А-ЯЁ][-А-яЁё]+";

and for the full name:

private static final String ruNamePart = "[А-яЁё][-А-яЁё]+";
private static final String ruFullNameRegEx = "\\s*[А-ЯЁ][-А-яЁё]+\\s+(" + ruNamePart + "\\s+){1,5}" + ruNamePart + "\\s*";)";

The last one covers some complex cases:

public class Test {
    Pattern ruFullNamePattern = Pattern.compile(ruFullNameRegEx);

    @Test
    public void test1() {
        assertTrue(isRuFullName("Иванов Василий Иванович"));
    }

    @Test
    public void test2() {
        assertTrue(isRuFullName(" Иванов Василий Акимович "));
    }

    @Test
    public void test3() {
        assertTrue(isRuFullName("Ёлкин Василий Иванович"));
    }

    @Test
    public void test4() {
        assertTrue(isRuFullName("Иванов Василий Аксёнович"));
    }

    @Test
    public void test5() {
        assertFalse(isRuFullName("иванов василий акимович"));
    }

    @Test
    public void test6() {
        assertFalse(isRuFullName("Иванов С.В."));
    }

    @Test
    public void test7() {
        assertTrue(isRuFullName("Мамин-Сибиряк Анна-Мария Иоановна"));
    }

    @Test
    public void test8() {
        assertTrue(isRuFullName("Хаджа Насредин Махмуд-Азгы-Бек"));
    }

    @Test
    public void test9() {
        assertTrue(isRuFullName("Хаджа Насредин ибн Махмуд"));
    }

    private boolean isRuFullName(String testString) {
        Matcher m = ruFullNamePattern.matcher(testString);
        return m.matches();
    }
}

Upvotes: 15

Op De Cirkel
Op De Cirkel

Reputation: 29493

You have to use Unicode regex . for example \p{L}+ for any unicode letter. For more look in the java doc for java.util.Pattern there is section called unicode support. Also, there are details here: link

Upvotes: 18

Related Questions