Hamza Dabjan
Hamza Dabjan

Reputation: 49

Conflicting happens when generating two hashmaps in java

I have two text, I make tokenizing and removing stop words then I make lemmatization then I generate an hashmap that contains the lemma and its frequency in the text.

when I apply above steps on one text everything works fine like below:

String train = "hamza was studying hamza studied yesterday";
    String test = "hamza is swimming today";

    sportBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(train)));

    //testBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(test)));
    System.out.println("hashmap for train");

    for (String name : sportBag.keySet()) {
        String key = name;
        int value = sportBag.get(name);
        System.out.println(key + " " + value);
    }

    System.out.println("hashmap for test");
    for (String name : testBag.keySet()) {
        String key = name;
        int value = testBag.get(name);
        System.out.println(key + " " + value);
    }

and the output is as expected

hashmap for train
yesterday 1
study 2
hamza 2

The problem happens when I generate two hashmaps like below:

String train = "hamza was studying hamza studied yesterday";
    String test = "hamza is swimming today";

    sportBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(train)));

    testBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(test)));
    System.out.println("hashmap for train");

    for (String name : sportBag.keySet()) {
        String key = name;
        int value = sportBag.get(name);
        System.out.println(key + " " + value);
    }

    System.out.println("hashmap for test");
    for (String name : testBag.keySet()) {
        String key = name;
        int value = testBag.get(name);
        System.out.println(key + " " + value);
    }

and here the problem happen

hashmap for train
yesterday 1
swimming 1
study 2
today 1
hamza 2
hashmap for test
yesterday 1
swimming 1
study 2
today 1
hamza 2

here is Bag method :

public Map<String, Integer> words = new HashMap<>();

/**
 * Constructor.
 *
 * @param wordsList
 * @return
 */
public Map<String, Integer> Bag(List<String> wordsList) {
    for (int i = 0; i < wordsList.size(); i++) {
        int freq = 0;
        for (int j = 0; j < wordsList.size(); j++) {
            if (wordsList.get(j).equals(wordsList.get(i))) {
                freq++;
            }
        }
        if (!words.containsKey(wordsList.get(i))) {
            words.put(wordsList.get(i), freq);
        }
    }
    return words;
}

why this happens?

Upvotes: 0

Views: 57

Answers (1)

CliveLewis
CliveLewis

Reputation: 74

You are using the same instance of the bagOfWords for sportBag and testBag. And since your .Bag method never clears the map, it tries to add values to the map with existing values.

You have 2 options here:

  • Clear the map at the start of the .Bag() method.
  • Create the new instance of the bagOfWords every time you need to generate the HashMap.

Upvotes: 1

Related Questions