Victor2748
Victor2748

Reputation: 4199

Java - what is the best way to check if a STRING contains only certain characters?

I have this problem: I have a String, but I need to make sure that it only contains letters A-Z and numbers 0-9. Here is my current code:

boolean valid = true;
for (char c : string.toCharArray()) {
    int type = Character.getType(c);
    if (type == 2 || type == 1 || type == 9) {
        // the character is either a letter or a digit
    } else {
        valid = false;
        break;
    }
}

But what is the best and the most efficient way to implement it?

Upvotes: 10

Views: 40377

Answers (8)

Mikael Vandmo
Mikael Vandmo

Reputation: 945

StringUtils in Apache Commons Lang 3 has a containsOnly method, https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html

The implementation should be fast enough.

Upvotes: 3

Pier-Alexandre Bouchard
Pier-Alexandre Bouchard

Reputation: 5245

The following way is not as fast as Regular expression to implement but is one of the most efficient solution (I think) because it use bitwise operations which are really fast.

My solution is more complex and harder to read and maintain but I think it is another simple way to do what you want.

A good way to test that a string only contains numbers or capital letters is with a simple 128 bits bitmask (2 Longs) representing the ASCII table.

So, For the standard ASCII table, there's a 1 on every character we want to keep (bit 48 to 57 and bit 65 to 90)

Thus, you can test that a char is a:

  1. Number with this mask: 0x3FF000000000000L (if the character code < 65)
  2. Uppercase letter with this mask: 0x3FFFFFFL (if the character code >=65)

So the following method should work:

public boolean validate(String aString) {
    for (int i = 0; i < aString.length(); i++) {
        char c = aString.charAt(i);

        if ((c <= 64) & ((0x3FF000000000000L & (1L << c)) == 0) 
                | (c > 64) & ((0x3FFFFFFL & (1L << (c - 65))) == 0)) {
            return false;
        }
    }

    return true;
}

Upvotes: 2

Kirill Rakhman
Kirill Rakhman

Reputation: 43861

Additionally to all the other answers, here's a Guava approach:

boolean valid = CharMatcher.JAVA_LETTER_OR_DIGIT.matchesAllOf(string);

More on CharMatcher: https://code.google.com/p/guava-libraries/wiki/StringsExplained#CharMatcher

Upvotes: 3

Michael Krause
Michael Krause

Reputation: 4869

Since no one else has worried about "fastest" yet, here is my contribution:

boolean valid = true;

char[] a = s.toCharArray();

for (char c: a)
{
    valid = ((c >= 'a') && (c <= 'z')) || 
            ((c >= 'A') && (c <= 'Z')) || 
            ((c >= '0') && (c <= '9'));

    if (!valid)
    {
        break;
    }
}

return valid;

Full test code below:

public static void main(String[] args)
{
    String[] testStrings = {"abcdefghijklmnopqrstuvwxyz0123456789", "", "00000", "abcdefghijklmnopqrstuvwxyz0123456789&", "1", "q", "test123", "(#*$))&v", "ABC123", "hello", "supercalifragilisticexpialidocious"};

    long startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericOriginal(testString);
    }

    System.out.println("Time for isAlphaNumericOriginal: " + (System.nanoTime() - startNanos) + " ns"); 

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericFast(testString);
    }

    System.out.println("Time for isAlphaNumericFast: " + (System.nanoTime() - startNanos) + " ns");

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericRegEx(testString);
    }

    System.out.println("Time for isAlphaNumericRegEx: " + (System.nanoTime() - startNanos) + " ns");

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericIsLetterOrDigit(testString);
    }

    System.out.println("Time for isAlphaNumericIsLetterOrDigit: " + (System.nanoTime() - startNanos) + " ns");      
}

private static boolean isAlphaNumericOriginal(String s)
{
    boolean valid = true;
    for (char c : s.toCharArray()) 
    {
        int type = Character.getType(c);
        if (type == 2 || type == 1 || type == 9) 
        {
            // the character is either a letter or a digit
        }
        else 
        {
            valid = false;
            break;
        }
    }

    return valid;
}

private static boolean isAlphaNumericFast(String s)
{
    boolean valid = true;

    char[] a = s.toCharArray();

    for (char c: a)
    {
        valid = ((c >= 'a') && (c <= 'z')) || 
                ((c >= 'A') && (c <= 'Z')) || 
                ((c >= '0') && (c <= '9'));

        if (!valid)
        {
            break;
        }
    }

    return valid;
}

private static boolean isAlphaNumericRegEx(String s)
{
    return Pattern.matches("[\\dA-Za-z]+", s);
}

private static boolean isAlphaNumericIsLetterOrDigit(String s)
{
    boolean valid = true;
    for (char c : s.toCharArray()) { 
        if(!Character.isLetterOrDigit(c))
        {
            valid = false;
            break;
        }
    }
    return valid;
}

Produces this output for me:

Time for isAlphaNumericOriginal: 164960 ns
Time for isAlphaNumericFast: 18472 ns
Time for isAlphaNumericRegEx: 1978230 ns
Time for isAlphaNumericIsLetterOrDigit: 110315 ns

Upvotes: 14

Hannes
Hannes

Reputation: 2073

The best way in sense of maintainability and simplicity is the already posted regular expression. Once familiar the this technic you know what to expect and it is very easy to widen the criteria if needed. Downside of this is the performance.

The fastest way to go is the Array approach. Checking if a character's numerical value falls in the wanted range ASCII A-Z and 0-9 is nearly speed of light. But the maintainability is bad. Simplicity gone.

You could use and java 7 switch case with char approach but that's just as bad as the second.

In the end, since we are talking about java, I would strongly suggest to use regular expressions.

Upvotes: 1

If you want to avoid regex, then the Character class can help:

boolean valid = true;
for (char c : string.toCharArray()) { 
    if(!Character.isLetterOrDigit(c))
    {
        valid = false;
        break;
    }
}

If you care about being upper case, then do below if statement instead:

if(!((Character.isLetter(c) && Character.isUpperCase(c)) || Character.isDigit(c)))

Upvotes: 9

Christoph L&#252;hr
Christoph L&#252;hr

Reputation: 53

You could use Apache Commons Lang:

StringUtils.isAlphanumeric(String)

Upvotes: 2

M A
M A

Reputation: 72884

Use a regular expression:

Pattern.matches("[\\dA-Z]+", string)

[\\dA-Z]+: At least one occurrence (+) of digits or uppercase letters.

If you want to include lowercase letter, replace [\\dA-Z]+ with [\\dA-Za-z]+.

Upvotes: 2

Related Questions