Saad
Saad

Reputation: 397

Java Processing input from a file

So I am doing this past sample final exam where the question asks to read input from a file and then process them into words. The end of a sentence is marked by any word that ends with one of the three characters . ? !

I was able to write a code for this however I can only split them into sentences using scanner class and using use.Delimiter. I want to process them into words and see if a word ends in the above sentence separator then I will just stop adding words into the sentence class. Any help would be appreciated as I am learning this on my own and this is what I came up with. My code is here.

File file = new File("finalq4.txt");
    Scanner scanner = new Scanner(file);
    scanner.useDelimiter("[.?!]");
    while(scanner.hasNext()){
        sentCount++;
        line = scanner.next();
        line = line.replaceAll("\\r?\\n", " ");
        line = line.trim();
        StringTokenizer tokenizer = new StringTokenizer(line, " ");
        wordsCount += tokenizer.countTokens();
        sentences.add(new Sentence(line,wordsCount));
        for(int i = 0; i < line.replaceAll(",|\\s+|'|-","").length(); i++){
            currentChar = line.charAt(i);
            if (Character.isDigit(currentChar)) {
            }else{
                lettersCount++;
            }
        }
    }

What I am doing in this code is that I am splitting the input into sentences using the Delimiter method and then counting the words, letters of the entire file and storing the sentences in a sentence class.

If I want to split this into words, how can I do that without using the scanner class.

Some of the input from the file that I have to process is here:

Text that follows is based on the Wikipedia page on cryptography!

Cryptography is the practice and study of hiding information. In modern times, cryptography is considered to be a branch of both mathematics and computer science, and is affiliated closely with information theory, computer security, and engineering. Cryptography is used in applications present in technologically advanced societies; examples include the security of ATM cards, computer passwords, and electronic commerce, which all depend on cryptography.....

I can further elaborate on this question if it needs explanation.

What I want to be able to do is to keep adding words to the sentence class and stop if the word ends in one of the above sentence separator. And then read another word and keep adding the words until I hit another separator.

Upvotes: 1

Views: 322

Answers (3)

Saad
Saad

Reputation: 397

Okay so i have been solving this question through several techniques and one of the approach was above. however i was able to solve this with another approach as well which does not involve using Scanner class. This one was much more accurate and it gave me the exact output whereas in the above i was off by a few words and letters.

try {
        input = new BufferedReader(new FileReader("file.txt"));
        strLine = input.readLine();
        while(strLine!= null){

            String[] tokens = strLine.split("\\s+");
            for (int i = 0; i < tokens.length; i++) {
                if(strLine.isEmpty()){
                    continue;
                }
                String s = tokens[i];
                wordsJoin += tokens[i] + " ";

                wordCount += i;
                int len = s.length();
                String charString = s.replaceAll("[^a-zA-Z ]", "");
                for(int k =0; k<charString.length(); k++){
                    currentChar = charString.charAt(k);
                    if(Character.isLetter(currentChar)){ 
                        lettersCount++;
                    }  
                }
                if (s.charAt(len - 1) == '.' || s.charAt(len - 1) == '?' || s.charAt(len - 1) == '!') {
                    sentences.add(new Sentence(wordsJoin, wordCount));
                    sentCount++;
                    numOfWords += countWords(wordsJoin);
                    wordsJoin = "";
                    wordCount = 0;
                } 
            }
            strLine = input.readLine();
        }

This might be useful for anyone doing the same problem or just need an idea of how to count letters, words and sentences from a text file.

Upvotes: 0

Lucurious
Lucurious

Reputation: 174

You can use a buffered reader to read every line of the file. Then split every line into a sentence with the split method and finally to get the words just split the sentence with the same method. In the end it would look something like this:

BufferedReader br;
try{
    br = new BufferedReader(new File(fileName));
}catch (IOException e) {e.printStackTrace();}
StringBuilder sb = new StringBuilder();
String line;
while((line = br.readLine()) != null){
    sb.append(line);
}
String[] sentences = sb.toString().split("\\.");
for(String sentence:sentences){
    String word = sentence.split(" ");
    //Add word to sentence...
}
try{
    br.close();
}catch(IOException e){
    e.printStackTrace();
}

Upvotes: 1

roopaliv
roopaliv

Reputation: 447

The snippet below shall work

public static void main(String[] args) throws FileNotFoundException {
    File file = new File("final.txt");
    Scanner scanner = new Scanner(file);
    scanner.useDelimiter("[.?!]");
    int sentCount;
    List<Sentence> sentences = new ArrayList<Sentence>();
    while (scanner.hasNext()) {
        String line = scanner.next();
        if (!line.equals("")) { /// for the ... in the end
            int wordsCount = 0;
            String[] wordsOfLine = line.split(" ");
            for (int i = 0; i < wordsOfLine.length; i++) {
                wordsCount++;
            }
            Sentence sentence = new Sentence(line, wordsCount);
            sentences.add(sentence);
        }
    }
}



public class Sentence {
    String line = "";
    int wordsCount = 0;
    public Sentence(String line, int wordsCount) {
        this.line = line;
        this.wordsCount=wordsCount;
}

Upvotes: 1

Related Questions