Reputation: 631
I'm reading a text file and storing a set of unique words from that text file into an ArrayList (please do suggest if there is a better structure for doing this). I'm using scanner to scan the text file and specifiying the delimiter as " " (space) as follows;
ArrayList <String> allWords = new ArrayList <String> ();
ArrayList <String> Vocabulary = new ArrayList <String> ();
int count = 0;
Scanner fileScanner = null;
try {
fileScanner = new Scanner (new File (textFile));
} catch (FileNotFoundException e) {
System.out.println (e.getMessage());
System.exit(1);
}
fileScanner.useDelimiter(" ");
while (fileScanner.hasNext()) {
allWords.add(fileScanner.next().toLowerCase());
count++;
String distinctWord = (fileScanner.next().toLowerCase());
System.out.println (distinctWord.toString());
if (!allWords.contains(distinctWord)) {
Vocabulary.add(distinctWord);
}
}
So, after printing the contents of Vocabulary, there is a word being skipped after every word. Hence for example if I have the following text file;
"The quick brown fox jumps over the lazy dog"
The contents printed are "quick fox over lazy" and then it gives me an error;
Exception in thread "main" java.util.NoSuchElementException
at java.util.Scanner.throwFor(Unknown Source)
at java.util.Scanner.next(Unknown Source)
at *java filename*.getWords(NaiveBayesTxtClass.java:82)
at *java filename*.main(NaiveBayesTxtClass.java:22)
Could anyone please give me some suggestions on how to fix this? I have a feeling its something to do with the fileScanner.useDelimiter and fileScanner.hasNext() statements.
Upvotes: 2
Views: 429
Reputation: 109597
As you also asked for the data structures, you can do:
List<String> allWords = new ArrayList<String>();
SortedSet<String> Vocabulary = new TreeSet<String>();
int count = 0;
Scanner fileScanner = null;
try {
fileScanner = new Scanner(new File(textFile));
} catch (FileNotFoundException e) {
System.out.println(e.getMessage());
System.exit(1);
}
fileScanner.useDelimiter(" ");
while (fileScanner.hasNext()) {
String word = fileScanner.next().toLowerCase();
allWords.add(word);
if (Vocabulary.add(word)) {
System.out.print("+ ");
}
System.out.println(word);
}
As you can see the variables are declared by interface (List, SortedSet) and implemented with a concrete class. This not only allows reimplementation, but is especially useful for function parameters.
Upvotes: 2
Reputation: 285405
You're calling Scanner#next() twice after check hasNext() once,and you're ignoring one of the returns of next().
You call it at (1) and add it to allWords
and call it again at (2) and print it.
while (fileScanner.hasNext()) {
allWords.add(fileScanner.next().toLowerCase()); // **** (1)
count++;
String distinctWord = (fileScanner.next().toLowerCase()); // **** (2)
System.out.println (distinctWord.toString());
if (!allWords.contains(distinctWord)) {
Vocabulary.add(distinctWord);
}
}
Solution: Call Scanner#next() once, save the String returned to a variable, then add the variable to the HashSet, and print the variable. e.g.,
while (fileScanner.hasNext()) {
String word = fileScanner.next().toLowerCase();
allWords.add(word); // **** (1)
count++;
// String distinctWord = (fileScanner.next().toLowerCase()); // **** (2)
System.out.println (word);
vocabularySet.add(word); // a HashSet
}
A general rule of safety is, you should have a one-to-one relationship for each call to Scanner#hasNextXXX()
and Scanner#nextXXX()
Upvotes: 5