Nativ
Nativ

Reputation: 3150

fastest way to detect if string contains specific chars

I'm building an XML parser that goes over a big XML file and I'm looking for the fastest way to detect if a string contains a char(that isn't a " ", "\n" or "\r"). I've tried using regex but it is too slow and heavy. Another method I tried was to get the ASCII number of " ", "\n" and "\r" and to reduce it from the size of the String, if it's larger then there's at least one char. This operation is also heavy. Good advice would be appreciated.

Edit - Clarification:

Too slow is 300 milliseconds for a line of XML parsing + string manipulation.

Examples to the 2 ways I implemented:

By Redex:

if (!str.matches(".*\\w.*")
{
  // str that doesn't contains chars
}

By summing up ASCII values:

if (numOfWhitespaces + numOfSpecialChars >= str.length()) // +1 for ending /r in
  str
{
    // str that doesn't contains chars
}

The first solution(Regex) is slower in 200 milliseconds. On a file with 500+ lines (where each line is being processed independently) it's crucial.

I hope it's clear enough. thanks!

Upvotes: 0

Views: 1404

Answers (1)

Peter Lawrey
Peter Lawrey

Reputation: 533472

The fastest way to scan a String is with a SAX listener

public void characters(char ch[], int start, int length) throws SAXException {
    for(int i=start, end = start+ length; i < end; i++) {
       if(ch[i] <= ' ') {
          // check if it is a white space
       }
    }
}

If you are not use a SAX parse or an event driven parser, this could be your performance bottleneck.

Upvotes: 4

Related Questions