Terence Chan
Terence Chan

Reputation: 21

Java Performance issue for long length StringTokenizer

I have a program that read and process data in a raw text String using StringTokenizer

Originally the StringTokenizer contains about 1,500 tokens and the program works fine. However the raw content increased and now it become about 12,000 tokens and the CPU consumption is largely increased.

I'm looking into the problem and try to identify the root cause. The program uses a while loop to check if there is any token left, and based on the token read, a different action would be taken. I'm checking those different actions to see if those action could be improved.

Meanwhile I would like to ask if handling one long length StringTokenizer would cost more CPU comparing to handling 10 short StringTokenizers.

Upvotes: 1

Views: 1037

Answers (3)

user890904
user890904

Reputation:

StringTokenizer usage is discouraged according to the StringTokenizer java doc. It is not deprecated though so its possible to use. only its not recommended. here is what is written:

"StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead."

Please check the following post. It has a very nice example of various ways to doing the same thing that you try to do.

performance-of-stringtokenizer-class-vs-split-method-in-java

you can try the samples provided there and see what works best for you.

Upvotes: 1

Terence Chan
Terence Chan

Reputation: 21

First of all, thanks for your opinions. During last weekend I have run stress test with real data using a revised program and so happy that my problem is solved (Many thanks to A.J. ^_^ ). I would like to share my findings.

After studying the example mentioned by A.J., I have run some test program to read and process data using StringTokenizer and "indexOf" (Regex is even worst compared to StringTokenizer in my situation). My test program would count how many mini second is needed to process 24 messages (~12000 tokens each).

StringTokenizer need ~2700ms to complete, and "indexOf" only take ~210ms!

I've then revised my program like this (with minimum changes) and tested with real volume during last weekend:

Original program:

public class MsgProcessor {
    //Some other definition and methods ...

    public void processMessage (String msg) 
    {
        //...

        StringTokenizer token = new StringTokenizer(msg, FieldSeparator);
        while (token.hasMoreTokens()) {
            my_data = token.nextToken();
            // peformance different action base on token read
        }
    }
}

And here is updated program using "indexOf":

public class MsgProcessor {
    //Some other definition and methods ...
    private int tokenStart=0;
    private int tokenEnd=0;

    public void processMessage (String msg) 
    {
        //...
        tokenStart=0;
        tokenEnd=0;

        while (isReadingData) {
            my_data = getToken(msg);
            if (my_data == null)
                break;
            // peformance different action base on token read ...
        }
    }

    private String getToken (String msg)
    {
        String result = null;
        if ((tokenEnd = msg.indexOf(FieldSeparator, tokenStart)) >= 0) {
            result = msg.substring(tokenStart, tokenEnd);
            tokenStart = tokenEnd + 1;
        }
        return result;
    }
}
  • Please noticed that there is no "null" data in original tokens. If no FieldSeparator found, "getToken(msg)" will return null (as a signal for "no more token").

Upvotes: 1

Mister Smith
Mister Smith

Reputation: 28168

Why don't you try the newer Scanner class instead? Scanners can be constructed using streams and files. Not sure it is more efficient than the old StringTokenizer, though.

Upvotes: 0

Related Questions