Wexylwoxyl
Wexylwoxyl

Reputation: 3

How to filter commonly used words from document text? (Hash Maps)

Thank you for reading. I currently have a school project I am really stuck on. The purpose is to retrieve document text from the web and then store each word into a map object while omitting commonly used words like "which, about, during, after," etc.

Essentially it comes down to this:

//List of words to ignore

    Set<String> ignore = new HashSet<>(Arrays.asList(new String[]{
  "after", "which", "later", "other", "during", "their", "about"}));

//Will iterate through the document text (content) for words that adhere to the word_pattern (lets say word will have 5 letters or more for simplicity)

Matcher match = Pattern.compile(word_pattern).matcher(content);
while (match.find()) {
   String word = match.group().toLowerCase();

So now in this while loop I wish to skip any word in the ignore set and otherwise add it to a map object... but I can not seem to get it right and nothing seems to click for me. I can easily just add all the words to a map object and take some point deductions but I would like to be able to get this right for my sanity.

Upvotes: 0

Views: 101

Answers (1)

Jean-Fran&#231;ois Savard
Jean-Fran&#231;ois Savard

Reputation: 21004

Your ignore words list is a Set which offer the contains method so you can simply add this condition in your loop :

if(!ignore.contains(word))
{
    //addToList
}

Upvotes: 2

Related Questions