Traveling Saleswoman
Traveling Saleswoman

Reputation: 55

How to scan for words in Java excluding punctuation

I'm trying to use the scanner class to parse all the words in a file. The file contains common text, but I only want to take the words excluding all the puntuation. The solution I have until now is not complete but is already giving me some problem:

Scanner fileScan= new Scanner(file);
String word;
while(fileScan.hasNext("[^ ,!?.]+")){       
    word= fileScan.next();
    this.addToIndex(word, filename);
}

Now if I use this on a sentence like "hi my name is mario!" it returns just "hi", "my", "name" and "is". It's not matching "mario!" (obviously) but it's not matching "mario", like I think it should.

Can you explain why is that and helping me find a better solution if you have one? Thank you

Upvotes: 0

Views: 7137

Answers (2)

Bozho
Bozho

Reputation: 597224

Since you want to get rid of the punctuation, you can simply replace all punctuation marks before adding to the index:

word = word.replaceAll("\\{Punct}", "");

In the case of hypens, or other isolated punctuation marks, you just check if word.isEmpty() before adding.

Of course, you'd have to get rid of your custom delimiter.

Upvotes: 0

Miserable Variable
Miserable Variable

Reputation: 28761

This works:

import java.util.*;

class S {

    public static void main(String[] args) {
        Scanner fileScan= new Scanner("hi my name is mario!").useDelimiter("[ ,!?.]+");
        String word;
        while(fileScan.hasNext()){       
            word= fileScan.next();
            System.out.println(word);
        }

    } // end of main()
}


javac -g S.java && java S
hi
my
name
is
mario

Upvotes: 5

Related Questions