mat_boy
mat_boy

Reputation: 13686

Effective way to build a list of tokens from multiple strings in Java

I'm looking for an efficient way to obtain a list of String tokens extracted from multiple Strings (e.g. with a whitespace separator).

Example:

String s1 = "My mom cook everyday";
String s2 = "I eat everyday";
String s3 = "Am I fat?";  
LinkedList<String> tokens = new LinkedList<String>();   
//any code to efficiently get the tokens

//final result is tokens  make of a list of the following tokens:
//"My", "mom", "cook", "everyday", "I", "eat", "everyday", "Am", "I", "fat?".

Now

  1. I'm not sure that LinkedList is the most effective collection class to be used (Apache Commons, Guava, may they help?)!
  2. I was going to use StringUtils from Apache Commons, but the split method returns an array! So, I should extract with a for cycle the Strings from the array of String objects returned by split. Is that efficient: I don't know, split creates an array!
  3. I read about Splitter from Guava, but this post states that StringUtils is better in practice.
  4. What about Scanner from Java.util. It seems to not allocate any additional data structures. Isn't it?

Please, draw the most efficient Java solution, even by using additional widely used library, like Guava and Apache Commons.

Upvotes: 0

Views: 1318

Answers (5)

Rodrigo Sasaki
Rodrigo Sasaki

Reputation: 7226

If you have small Strings and performance isn't an issue, you can just combine split with addAll like this:

String s1 = "My mom cook everyday";
String s2 = "I eat everyday";
String s3 = "Am I fat?";  
List<String> tokens = new ArrayList<String>();  

tokens.addAll(Arrays.asList(s1.split("\\s+")));
tokens.addAll(Arrays.asList(s2.split("\\s+")));
tokens.addAll(Arrays.asList(s3.split("\\s+")));

System.out.println(tokens);

However if performance is an issue here's an alternative solution:

Since there is no definition in how these long texts are acquired, I'll assume they come in an InputStream. See if this method is performatic enough to fit your needs:

public List<String> readTokens(InputStream is) throws IOException{
    Reader reader = new InputStreamReader(is);
    List<String> tokens = new ArrayList<String>();
    BufferedReader bufferedReader = new BufferedReader(reader);
    String line = null;
    while((line = bufferedReader.readLine()) != null){
        String[] lineTokens = StringUtils.split(line, " "); 
        for(int i = 0 ; i < lineTokens.length ; i++){
            tokens.add(lineTokens[i]);
        }
    }
    return tokens;
}

And as to your statement regarding ArrayList vs LinkedList for inserting at the end, perhaps you should read this

Upvotes: 4

Tarun Bharti
Tarun Bharti

Reputation: 195

     import java.util.ArrayList;
     import java.util.Collections;


    public class stringintotoken {
String s="my name is tarun bharti";
ArrayList <String> words=new ArrayList<String>();
public static void main(String[] args)
{
    stringintotoken st=new stringintotoken();
    st.go();
}
public void go()
{
    wordlist();
    System.out.println(words);
    Collections.sort(words);
    System.out.println(words);

}
public void wordlist()
{
    String[] tokens=s.split(" ");
    for(int i=0;i<tokens.length;i++)
    {
    words.add(tokens[i]);
    }
}

}

Upvotes: 0

Louis Wasserman
Louis Wasserman

Reputation: 198341

for (String str : Arrays.asList(s1, s2, s3)) {
  Iterables.addAll(tokens, Splitter.on(' ').split(str));
}

would be the way I'd do it. That said, ArrayList is preferable to LinkedList for almost all use cases; without further data, we really can't tell whether or not you're in one of those rare cases where LinkedList is preferable.

Upvotes: 5

metaphori
metaphori

Reputation: 2811

First join your strings using your separator (see Join a string using delimiters). Then:

 LinkedList<String> tokens = new LinkedList<String>();
 StringTokenizer st = new StringTokenizer(yourstr); // " " as a default delimiter
 while (st.hasMoreTokens()) {
     tokens.add(st.nextToken());
 }

Are you looking for an efficient or performant solution (i.e. what is your constraints/reference performance)?

Upvotes: 0

AlexR
AlexR

Reputation: 115378

or just Arrays.asList((s1 + " " + s2 + " " + s3).split("\\s+"))

Upvotes: 0

Related Questions