user3835720
user3835720

Reputation: 81

Java: Searching text for Multiple Occurrences

Q: If I were given an exorbitantly large filled with random English words and were told to find specific sub-strings cut by a whitespace [For example, "how now", "brown cow", etc.], and then return the position at which it appears, how would I do it?

A: I have a partial solution, but I'm asking the Stack Overflow community for help completing the last bit.

How Program Should Run:

Solution 1:

int chn = 0;
int cbc = 0;

Scanner in = new Scanner(new File("filename.txt"));
String temp = in.nextLine();

Pattern phn = Pattern.compile("how now");
Pattern pbc = Pattern.compile("brown cow");
Matcher mhn = null;
Matcher mbc = null;

while (in.hasNext()) {

    mhn = phn.matcher(temp);
    while (mhn.find()) m++;

    mbc = pbc.matcher(temp);
    while (mbc.find()) j++;

    temp = in.nextLine();
} // Formatted output comes after

The thing is while this keeps track of the number of occurrences (chn, cbc) by using Patterns and Matchers and also keeps track of chronological occurrence, and is the fastest algorithm in doing so, I'm at a loss for how I can keep track of where in the line it occurs.

Solution 2:

Scanner in = new Scanner(new File("filename.txt"));
ArrayList<String> wordsInLine = new ArrayList<>();
String temp = in.nextLine();
String temp2 = "";

ctL = 1;

while (in.hasNext()) { 
    if (temp.contains("how now")) {
        for (String word : temp.split(" ")) {
            wordsInLine.add(word);
        }
        for (int i = 0; i < wordsInLine.size(); i++) {
            if (wordsInLine.get(i).equals("how") || 
                wordsInLine.get(i + 1).equals("now")) {

                System.out.println("This returns line count and "
                    + "the occurrence by getting i");
            }
        }
    }

    ctL++;
    temp = in.nextLine();
}

But this second partial solution seems incredibly inefficient and terribly slow, using two for loops for every line that contains "how now."
Is there a more elegant way of doing this?

Upvotes: 2

Views: 1329

Answers (2)

Rajesh
Rajesh

Reputation: 2155

Go with Solution 1. Use start, end and group methods to track subsequence matched:

mhn = phn.matcher(temp);

while (mhn.find()) {
    System.out.print(mhn.start() + ", ");
    System.out.print(mhn.end() + ", ");
    System.out.println(mhn.group());
    m++;
}

Upvotes: 0

chiwangc
chiwangc

Reputation: 3577

Solution 1 is definitely much more efficient and I would go for that approach for sure.

In order to keep track of the position of the matched pattern in a specific line, you can use the start() or the end() method of the Matcher class to get the corresponding indices.

Upvotes: 2

Related Questions