Reputation: 29
I am trying to write a Java class to find word surrounded by ( ) in text file and output the word and its occurrences in different line.
How can I write this in Java?
Input file
School (AAA) to (AAA) 10/22/2011 ssss(ffs)
(ffs) 7368 House 8/22/2011(h76yu) come 789 (AAA)
Car (h76yu) to (h76yu) extract9998790
2/3/2015 (AAA)
Output file
(AAA) 4
(ffs) 2
(h76yu) 3
This is what I got so far..
public class FindTextOccurances {
public static void main(String[] args) throws IOException {
int sum=0
String line = value.toString();
for (String word : line.split("(\\W+")) {
if (word.charAt(0) == '(‘ ) {
if (word.length() > 0) {
sum +=line.get();
}
context.write(new Text(word), new IntWritable(sum));
}
}
}
Upvotes: 2
Views: 102
Reputation: 88
This may help i did it with regular expressions i did not declared variables adjust them as to your needs.I wish this may solve your problem
BufferedReader fr = new BufferedReader(new InputStreamReader(new FileInputStream(file), "ASCII"));
while(true)
{
String line = fr.readLine();
if(line==null)
break;
String[] words = line.split(" ");//those are your words
}
for(int i = 0;i<=words.length();i++)
{
String a = words[i];
if(a.matches("[(a-z)]+"))
{
j=i;
while(j<=words.length();)
{
count++;
}
System.out.println(a+" "+count);
}
}
Upvotes: 0
Reputation: 140318
You can find the text between brackets without splitting or using regular expressions like so (assuming that all brackets are closed, and you don't have nested brackets):
int lastBracket = -1;
while (true) {
int start = line.indexOf('(', lastBracket + 1);
if (start == -1) {
break;
}
int end = line.indexOf(')', start + 1);
System.out.println(line.substring(start + 1, end - 1);
lastBracket = start;
}
Upvotes: 1
Reputation: 4662
If you split on "(\W+)" you are going to keep ALL the things that ARE NOT between parenthesis (as you are splitting on the parenthesized words).
What you want is a matcher:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
...
Map<String, Int> occurrences = new HashMap<>();
Matcher m = Pattern.compile("(\\W+)").matcher(myString);
while (m.find()) {
String matched = m.group();
String word =matched.substring(1, matched.length()-1); //remove parenthesis
occurrences.put(word, occurences.getOrDefault(word, 0)+1);
}
Upvotes: 0