Jason
Jason

Reputation: 11363

Is there a more efficient way than String.split() to break up a String into words?

My current project requires a search to be run on lyrics of a song, which is a String field in the Song object. To help make searches more efficient, I dump the lyric words into a set upon creation of the song object using String.split("[^a-zA-Z]"); to make a String array, then add to a set.

Is there a specific way to add the words to a set without the intermediate step of creating an array?

Upvotes: 1

Views: 1560

Answers (4)

Powerlord
Powerlord

Reputation: 88786

I don't know about efficiency, but alternately, you could do it like this:

import java.io.StringReader;

// ...

public static Set<String> getLyricSet(String lyrics) throws IOException {
    StringReader sr = new StringReader(lyrics);
    StringBuilder sb = new StringBuilder();
    Set<String> set = new HashSet<String>();
    int current;
    // Read characters one by one, returns -1 when we're done
    while ((current = sr.read()) != -1) {
        if (Character.isWhitespace(current)) {
            // End of word, add current word to set.
            set.add(sb.toString());
            sb = new StringBuilder();
        } else {
            sb.append((char) current);
        }
    }
    // End of lyrics, add current word to set.
    set.add(sb.toString());
    sr.close();

    return set;
}

Upvotes: 0

Do you search some words in particular song? If so, you may not really need a set for this, you can run your search just from the point you got the lyrics. You can use plain regexp for this, this might be a way bit faster than splitting the String, putting it into a set and querying the set then:

public class RegexpExample {

public static void main(String[] args) {
    String song = "Is this a real life? Is this just fantasy?";
    String toFind = "is";

    Pattern p = Pattern.compile(toFind, Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(song);

    while (m.find()) {
        String found = m.group();
        int startIndex = m.start();
        int endIndex = m.end();

        System.out.println(found + " at start " + startIndex + ", end " + endIndex);
        //do something with this info...
    }
}

It will output this:

Is at start 0, end 2
is at start 5, end 7
Is at start 21, end 23
is at start 26, end 28

If you however search in different songs you can concatenate their lyrics using StringBuilder, then call StringBuilder#toString and do the whole operation with the result of toString method

Upvotes: 1

novalis
novalis

Reputation: 1142

StringTokenizer st = new StringTokenizer("the days go on and on without you here");
HashSet<String> words = new HashSet<String>();
while (st.hasMoreTokens()) {
    words.add(st.nextToken());
}

Upvotes: 0

StriplingWarrior
StriplingWarrior

Reputation: 156459

Is there a specific way to add the words to a set without the intermediate step of creating an array?

Sure, you could write a method that returns an Iterator object, which feeds out one word at a time.

But something like this really isn't worth optimizing away. Your array will easily be small enough to fit into memory, it's creation won't be that expensive, and the garbage collector will clean it up afterward.

Upvotes: 1

Related Questions