krosenvold
krosenvold

Reputation: 77201

Fastest way to decide if string contains only allowed values?

I'm trying to determine what is the fastest way to implement this code:

Pattern ID_REGEX = Pattern.compile( "[A-Za-z0-9_\\-.]+" );
boolean match = ID_REGEX.matcher( id ).matches();
if ( !match ) throw new IllegalArgumentException("Disallowed character in ID");

Given that the ID_REGEX is constant, I would assume something like a BitSet or an array of permitted values is the fastest way to implement this, maybe even just a huge if statement.

Note that the search is for A-Za-z, not Character.isLetter.

Additional kudos for an OSS implementation

Upvotes: 2

Views: 136

Answers (2)

Dariusz
Dariusz

Reputation: 22271

My fast but unclear solution

// encapsulate this into a class and do once; perhaps use a static initializer
boolean[] allowed = new boolean[256]; // default false
allowed[32] = true;
allowed['a'] = true;
// fill all allowed characters
allowed['Z'] = true;

// the check
for (int n=0,len=str.length(); n<len; n++) {
  char ch = str.charAt(n);
  if (ch>255 || !allowed[ch]) {
    return false;
  }
}
return true;

Some additional casts may be needed, but I hope the idea is clear.

Upvotes: 3

popfalushi
popfalushi

Reputation: 1362

I suppose you could iterate through string, get code of character, compare with ASCII table codes. This way at each character you have 3 between comparisons(for a-z, A-Z, 0-9) and 6 usual integer comparisons for other chars. I think it will be fastest. You could also try multithreading approach, in which you do basically the same, but divide problem into several parts before starting and make checks concurrently.
Edit(after @krosenvold comment): multithreading approach is suitable only for very big strings, because creating threads has its own price.

Upvotes: 0

Related Questions