CookieMonst3r
CookieMonst3r

Reputation: 5

How To Count Word Occurence in a String HashMap

I was wondering how would I fix my code so that my output would be correct. I only can edit a specific section of my code. Thank you very much

This is my code

import java.util.HashMap;

public class OccurenceChecker {
    public static void main(String[] args) 
    { 

        //CANT BE FIXED
        String phrase = "Good Morning. Welcome to my store. My store is a grocery store.";

        HashMap<String, Integer> map = new HashMap<String, Integer>();
        String[] ignored = phrase.split("\n\t\r(){},:;!?.[]");

        //CAN BE FIX THIS POINT ON.
        for (String ignore : ignored) 
        {
            Integer count = map.get(ignore);
            if (count == null) 
            {
                count = 0;
            }
            map.put(ignore, count + 1);
        }

        for (int i = 0; i< ignored.length; i++)
        {
            System.out.println(ignored[i]);
        }
        System.out.println(map);
    }
}

EXPECTED OUTPUT

{a=1, Morning=1, grocery=1, Welcome=1, is=1, to=1, store=3, Good=1, my=2}

MY OUTPUT

{=2, a=1, Morning=1, grocery=1, Welcome=1, is=1, to=1, store=3, Good=1, my=1, My=1}

Upvotes: 1

Views: 5161

Answers (5)

holtkampjs
holtkampjs

Reputation: 27

The {=2... unexpected output is due to the sequence of "." followed by " " (period then space) is producing an empty string during the splitting of the sentence.

A solution around that is just skipping any empty strings.

import java.util.HashMap;

public class OccurenceChecker {
    public static void main(String[] args) 
    { 

        //CANT BE FIXED
        String phrase = "Good Morning. Welcome to my store. My store is a grocery store.";

        HashMap<String, Integer> map = new HashMap<String, Integer>();
        String[] ignored = phrase.split("\n\t\r(){},:;!?.[]");

        //CAN BE FIX THIS POINT ON.
        for (String ignore : ignored) 
        {
            // Skip any empty strings
            if (ignore.length() == 0)
            {
                continue;
            }

            Integer count = map.get(ignore);
            if (count == null) 
            {
                count = 0;
            }
            map.put(ignore, count + 1);
        }

        for (int i = 0; i< ignored.length; i++)
        {
            System.out.println(ignored[i]);
        }
        System.out.println(map);
    }
}

The other issue present in your output is my=1 My=1. If I were approaching this problem, the easiest solution is to cast all words to lowercase when storing to or querying the hashmap. One area of confusion I have is that the expected output seems to reflect the capitalization of the first occurrence of a word. That becomes an issue when the first occurrence is Good for instance and the next occurrence is good. Since the key in the hashmap would have the capitalization, it becomes difficult to query since the combinations of capital and non-capital letters used to find the Good key could increase in complexity as the words grow longer.

Upvotes: 0

Heena chourasia
Heena chourasia

Reputation: 1

import java.util.HashMap; import java.util.*;

public class countWords {

public static void main(String[] args) {
    countWords("java . . . java is text");
}
public static void countWords(String str){
    str= str.toLowerCase();
    str= str.replace(".","");
    Map<String, Integer> map= new HashMap<String,Integer>();
    String[] str_new= str.split("\\s+");
  for(String c:str_new){
      if(map.containsKey(c)){
          map.put(c,map.get(c)+1);
      }
      else
      {
          map.put(c,1);
      }
  }
  for(Map.Entry<String,Integer> entry: map.entrySet()){
      if(entry.getValue()>=1){
          System.out.println(entry.getKey()+ " :"+entry.getValue());
      }
  }


}

}

Upvotes: 0

Andre Compagno
Andre Compagno

Reputation: 681

I'm gonna build upon sprinters answer since he completely ignored what could and couldnt be changed in the question.

Using as nuch Java 8 as possible. This wouldnt really work in your case since the map is already initialized so it weird that youre creating another and replacing it

map = Arrays.stream(ignored)
        .filter(s -> !s.isEmpty()) // removed empty strings
        .map(String::toLowerCase) // makes all the strings lower case
        .collect(Collectors.groupingBy(Function.identy(), Collectors.counting());

Using more basic Java 8 features and using the originally created map.

Arrays.stream(ignored)
        .filter(s -> !s.isEmpty()) // removed empty strings
        .map(String::toLowerCase) // makes all the strings lower case
        .forEach(s -> map.put(s, map.getOrDefault(s, 0) + 1)

No Java 8

for (final String s : ignored) {
    if (s.isEmpty()) {
        continue; // skip empty strings
    }
    final String lowerS = s.toLowerCase();
    if (map.containsKey(lowerS)) {
        map.put(lowerS, map.get(lowerS) + 1)
    } else {
        map.put(lowerS, 1)
    }
}

Upvotes: 1

sprinter
sprinter

Reputation: 27986

A few suggestions for you to consider:

In regular expressions, \W refers to anything that isn't a word character (i.e anything that isn't a letter).

If you wish to split on any punctuation or spaces then you should have a + after \W in your regexp. This will count all of the subsequent ones as part of the same delimiter. That's why you are currently getting {=2 in your answer (there are two instances of ". " in your input which are interpreted by the split as delimiter, null, delimiter).

It looks as though you want 'my' and 'My' to be considered the same string. In that case you should use toLowerCase before adding them to the map.

If you are using Java 8 a nice easy way to maintain a running increment in a map is

Map<String,Integer> wordCount = new HashMap<>();
wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);

Again, with Java 8, you can do all of this in one go

Map<String,Long> wordCount = Arrays.stream(phrase.toLowerCase().split("\\W+"))
    .collect(Collectors.groupingBy(Function.identy(), Collectors.counting());

Upvotes: 3

TheLostMind
TheLostMind

Reputation: 36304

Your approach is not exactly correct (what if you have other symbols there?). Do this :

  1. Replace all non-alphanumeric characters with spaces.
  2. Split based space (\\s+).
  3. For each string in the split array : a. Check if you have a key equal to the string : YES : get the value, increment the count and put the value back. No : insert new key with value =1

Upvotes: 0

Related Questions