JD White
JD White

Reputation: 927

Java - Statically defined character lists

In any of the standard libraries is there a definition for characters classes (alpha, numeric, alphanumeric)? I'm checking if a string contains only alphanumeric characters or a colon:

StringUtils.containsOnly(input, ALPHA_NUMERIC + ":");

I could define ALPHA_NUMERIC myself, but it seems common characters classes would be defined in a standard library, although I have been unable to find the definitions.

edit: I did consider regex, but for my particular use case execution time is important, and a simple scan is more efficient.

edit: Here are the test results, using Regex, CharMatcher, and a simple scan (using the same set of valid/invalid input strings for each test):

Valid Input Strings:

CharMatcher, Num Runs: 1000000, Valid Strings: true, Time (ms): 1200

Regex, Num Runs: 1000000, Valid Strings: true, Time (ms): 909

Scan, Num Runs: 1000000, Valid Strings: true, Time (ms): 96

Invalid input strings:

CharMatcher, Num Runs: 1000000, Valid Strings: false, Time (ms): 277

Regex, Num Runs: 1000000, Valid Strings: false, Time (ms): 253

Scan, Num Runs: 1000000, Valid Strings: false, Time (ms): 36

Here is the code that performed the scan:

public boolean matches(String input) {
    for(int i=0; i<input.length(); i++) {
        char c = input.charAt(i);
        if( !Character.isLetterOrDigit(c) && c != ':') {
            return false;
        }
    }
    return true;
}

edit: I recompiled as a standalone program (I was running through eclipse):

CharMatcherTester, Num Runs: 1000000, Valid Strings: true, Time (ms): 418

RegexTester, Num Runs: 1000000, Valid Strings: true, Time (ms): 812

ScanTester, Num Runs: 1000000, Valid Strings: true, Time (ms): 88

CharMatcherTester, Num Runs: 1000000, Valid Strings: false, Time (ms): 142

RegexTester, Num Runs: 1000000, Valid Strings: false, Time (ms): 223

ScanTester, Num Runs: 1000000, Valid Strings: false, Time (ms): 32

Source: https://bitbucket.org/jdeveloperw/testing (This is my first time posting test results to SO, so comments are appreciated.)

Upvotes: 2

Views: 363

Answers (5)

Louis Wasserman
Louis Wasserman

Reputation: 198471

Guava's CharMatcher is pretty much exactly what you're asking for. Here is the wiki article. (Disclosure: I contribute to Guava.)

CharMatcher matcher = CharMatcher.JAVA_LETTER_OR_DIGIT.or(
  CharMatcher.is(':'));
return matcher.matchesAllOf(string);

Upvotes: 1

&#211;scar L&#243;pez
&#211;scar L&#243;pez

Reputation: 236140

Try this, using regular expressions:

boolean containsOnlyAlphanumeric = input.matches("[\\p{Alnum}:]+");

EDIT :

For the best performance you can pre-compile the pattern, store it in a statically defined pattern constant and reuse it whenever necessary:

// part of the class declaration
private static final Pattern ALPHANUMERIC_PLUS_COLON = Pattern.compile("[\\p{Alnum}:]+");

// whenever you need to check if the input matches the pattern
boolean containsOnlyAlphanumeric = ALPHANUMERIC_PLUS_COLON.matcher(input).matches();

I agree with Matthew Flaschen, you should not discard regular expressions right away, a well-built, pre-compiled regex can be as fast if not faster than a scan that checks for all possible valid characters in the input string. Benchmark first!

Upvotes: 2

Hiro2k
Hiro2k

Reputation: 5587

Well it does exist when you are talking about regex in which case the character class \w represents just that. That's why the String class has the matches method.

edit: That StringUtils class probably predates Java 1.4 when the matches method was added. A lot of the functionality that the Apache Commons classes provide have been folded into the standard library. They are still useful for when you have to use old versions of Java or you are using something that isn't in the standard library, but this doesn't seem to be one of the cases.

Upvotes: 2

Matthew Flaschen
Matthew Flaschen

Reputation: 285047

Your best bet is probably a regex Pattern.

It should match:

[\p{Alnum}:]*
  • \p{Alnum} - ASCII alphanumeric
  • [] - character class (any of the characters inside will match one character)
  • : - literal :
  • * - 0 or more

if it is all alphanumeric (or :).

You can use matches or pre-compile the regex.

Upvotes: 5

Jack
Jack

Reputation: 146

Regex matching would do the job. For example MyString.matches("[a-zA-Z0-9:]*");

Upvotes: 0

Related Questions