user1044680
user1044680

Reputation: 91

How do I check for non-word characters within a single word in Java?

I want to know if a String such as "equi-distant" or "they're" contains a non-word character. Is there a simple way to check for it?

Upvotes: 2

Views: 2956

Answers (4)

Colin Saxton
Colin Saxton

Reputation: 11

Java regular expression \w does not support unicode. \b does support unicode under java. I think that most flavours of regex support the standard \w notation [A-Za-z0-9_]. Also isLetter only returns letters and not numbers and underscore...so that does not work for "word characters" under regular expression...Maybe Java has changed since?

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 838156

It depends entirely on what you mean by "word character".

If by "word character" you mean A-Z or a-z then you can use this:

bool containsNonWordCharacter = s.matches(".*[^A-Za-z].*");

If you mean "any character that is considered to be a letter in Unicode", then look at Character.isLetter instead.

This is code provided by bobbymcr nearly works:

public static boolean hasNonWordCharacter(String s) {
    char[] a = s.toCharArray();
    for (char c : a) {
        if (!Character.isLetter(c)) {
            return true;
        }
    }

    return false;
}

However see the documentation:

Note: This method cannot handle supplementary characters. To support all Unicode characters, including supplementary characters, use the isLetter(int) method.

This should work for all Unicode characters:

public static boolean hasNonWordCharacter(String s) {

    int offset = 0, strLen = str.length();
    while (offset < strLen) {
        int curChar = str.codePointAt(offset);
        offset += Character.charCount(curChar);
        if (!Character.isLetter(curChar)) {
            return true;
        }
    }

    return false;
}

Upvotes: 2

bobbymcr
bobbymcr

Reputation: 24167

Solution without regex (generally faster for a very simple check like this):

public static boolean hasNonWordCharacter(String s) {
    char[] a = s.toCharArray();
    for (char c : a) {
        if (!Character.isLetter(c)) {
            return true;
        }
    }

    return false;
}

Upvotes: 6

stratwine
stratwine

Reputation: 3701

I like the non-regex way. But with regex it could be written like this-


private static boolean containsNonWord(String toCheck) {
        Pattern p = Pattern.compile("\\w*");
        return !p.matcher(toCheck).matches();
    }

Upvotes: 2

Related Questions