jk2016
jk2016

Reputation: 29

Find word using Java

I am trying to write a Java class to find word surrounded by ( ) in text file and output the word and its occurrences in different line.

How can I write this in Java?

Input file

School (AAA) to (AAA) 10/22/2011 ssss(ffs)
(ffs) 7368 House 8/22/2011(h76yu)  come 789  (AAA)
Car (h76yu) to  (h76yu) extract9998790
2/3/2015 (AAA) 

Output file

(AAA) 4    
(ffs) 2    
(h76yu) 3 

This is what I got so far..

public class  FindTextOccurances  {
public static void main(String[] args) throws IOException {

    int sum=0
    String line = value.toString();

    for (String word : line.split("(\\W+")) {
        if (word.charAt(0) == '(‘ ) {
            if (word.length() > 0) {
                sum +=line.get();
            }
            context.write(new Text(word), new IntWritable(sum));
        } 
    }
}

Upvotes: 2

Views: 102

Answers (3)

Akhil
Akhil

Reputation: 88

This may help i did it with regular expressions i did not declared variables adjust them as to your needs.I wish this may solve your problem

 BufferedReader fr = new BufferedReader(new InputStreamReader(new FileInputStream(file), "ASCII"));
    while(true)
    {
        String line = fr.readLine();
        if(line==null)
            break;
        String[] words = line.split(" ");//those are your words
    }
  for(int i = 0;i<=words.length();i++)
    {
        String a = words[i];
          if(a.matches("[(a-z)]+"))
             {
               j=i;
               while(j<=words.length();)
                 {
                        count++;
                 }
              System.out.println(a+" "+count);
             }
    }

Upvotes: 0

Andy Turner
Andy Turner

Reputation: 140318

You can find the text between brackets without splitting or using regular expressions like so (assuming that all brackets are closed, and you don't have nested brackets):

int lastBracket = -1;
while (true) {
  int start = line.indexOf('(', lastBracket + 1);
  if (start == -1) {
    break;
  }
  int end = line.indexOf(')', start + 1);

  System.out.println(line.substring(start + 1, end - 1);

  lastBracket = start;
}

Upvotes: 1

Diego Martinoia
Diego Martinoia

Reputation: 4662

If you split on "(\W+)" you are going to keep ALL the things that ARE NOT between parenthesis (as you are splitting on the parenthesized words).

What you want is a matcher:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
...
Map<String, Int> occurrences = new HashMap<>();
Matcher m = Pattern.compile("(\\W+)").matcher(myString);
while (m.find()) {
  String matched = m.group();
  String word =matched.substring(1, matched.length()-1); //remove parenthesis
  occurrences.put(word, occurences.getOrDefault(word, 0)+1);
 }

Upvotes: 0

Related Questions