Reputation: 13686
I'm looking for an efficient way to obtain a list of String tokens extracted from multiple Strings (e.g. with a whitespace separator).
Example:
String s1 = "My mom cook everyday";
String s2 = "I eat everyday";
String s3 = "Am I fat?";
LinkedList<String> tokens = new LinkedList<String>();
//any code to efficiently get the tokens
//final result is tokens make of a list of the following tokens:
//"My", "mom", "cook", "everyday", "I", "eat", "everyday", "Am", "I", "fat?".
Now
LinkedList
is the most effective collection class to be used (Apache Commons, Guava, may they help?)!StringUtils
from Apache Commons, but the split
method returns an array! So, I should extract with a for cycle the Strings from the array of String objects returned by split. Is that efficient: I don't know, split
creates an array!Splitter
from Guava, but this post states that StringUtils
is better in practice.Scanner
from Java.util
. It seems to not allocate any additional data structures. Isn't it?Please, draw the most efficient Java solution, even by using additional widely used library, like Guava and Apache Commons.
Upvotes: 0
Views: 1318
Reputation: 7226
If you have small Strings and performance isn't an issue, you can just combine split with addAll like this:
String s1 = "My mom cook everyday";
String s2 = "I eat everyday";
String s3 = "Am I fat?";
List<String> tokens = new ArrayList<String>();
tokens.addAll(Arrays.asList(s1.split("\\s+")));
tokens.addAll(Arrays.asList(s2.split("\\s+")));
tokens.addAll(Arrays.asList(s3.split("\\s+")));
System.out.println(tokens);
However if performance is an issue here's an alternative solution:
Since there is no definition in how these long texts are acquired, I'll assume they come in an InputStream
. See if this method is performatic enough to fit your needs:
public List<String> readTokens(InputStream is) throws IOException{
Reader reader = new InputStreamReader(is);
List<String> tokens = new ArrayList<String>();
BufferedReader bufferedReader = new BufferedReader(reader);
String line = null;
while((line = bufferedReader.readLine()) != null){
String[] lineTokens = StringUtils.split(line, " ");
for(int i = 0 ; i < lineTokens.length ; i++){
tokens.add(lineTokens[i]);
}
}
return tokens;
}
And as to your statement regarding ArrayList
vs LinkedList
for inserting at the end, perhaps you should read this
Upvotes: 4
Reputation: 195
import java.util.ArrayList;
import java.util.Collections;
public class stringintotoken {
String s="my name is tarun bharti";
ArrayList <String> words=new ArrayList<String>();
public static void main(String[] args)
{
stringintotoken st=new stringintotoken();
st.go();
}
public void go()
{
wordlist();
System.out.println(words);
Collections.sort(words);
System.out.println(words);
}
public void wordlist()
{
String[] tokens=s.split(" ");
for(int i=0;i<tokens.length;i++)
{
words.add(tokens[i]);
}
}
}
Upvotes: 0
Reputation: 198341
for (String str : Arrays.asList(s1, s2, s3)) {
Iterables.addAll(tokens, Splitter.on(' ').split(str));
}
would be the way I'd do it. That said, ArrayList
is preferable to LinkedList
for almost all use cases; without further data, we really can't tell whether or not you're in one of those rare cases where LinkedList
is preferable.
Upvotes: 5
Reputation: 2811
First join your strings using your separator (see Join a string using delimiters). Then:
LinkedList<String> tokens = new LinkedList<String>();
StringTokenizer st = new StringTokenizer(yourstr); // " " as a default delimiter
while (st.hasMoreTokens()) {
tokens.add(st.nextToken());
}
Are you looking for an efficient or performant solution (i.e. what is your constraints/reference performance)?
Upvotes: 0