String.split vs StringTokenizer on efficiency level

Question

I work on a large scale data set and as of that I am interested in the most efficient way to split a String.

Well I found that Scanner vs. StringTokenizer vs. String.Split and that string tokenizer in Java which pretty much state that I should not use StringTokenizer.

I was convinced not to use it until I checked the @Neil Coffey's experiment chart in the second post Performance of string tokenisation: String.split() and StringTokenizer compared where StringTokenizer is notably faster.

So my question is I should not use a class because it's legacy (as it's officially stated) or should I go for it instead? I must admit that efficiency is crucial enough in my project. String.split shouldn't be at least comparably fast?

Is there any other fast string split alternative?

Ashok_Pradhan · Accepted Answer

There is an efficient & more feature rich string splitting methods are available in Google Guava library .

Guava's split method

Ex:

Iterable splitted = Splitter.on(',')
    .omitEmptyStrings()
    .trimResults()
    .split("one,two,,   ,three");

for (String text : splitted) {
  System.out.println(text);
}

Output:

one
two
three

String.split vs StringTokenizer on efficiency level

Answers (1)

Related Questions