user1835504
user1835504

Reputation: 149

How to get duplicates?

I have a method that takes in a list of words. These words are checked against a hASHmap of words that has a String as a key, and an Integer as a value. The String is a word, and the Integer represents that words frequency in a text file.

Currently the list of words are ranked according to their frequency by putting them into a Treemap with the frequency becoming the key.

However, as there can be no duplicate keys, any words with the same frequency value in the Hashmap will not be entered into the Treemap.

What could I do in order to have a date structure that contains the words ranked by their frequency including duplicates?

   //given a list of words return a TreeMap of those words ranked by most frequent occurence
private TreeMap rankWords(LinkedList unrankedWords) {

    //treemap to automatically sort words by there frequency, making the frequency count the key.
    TreeMap<Integer, String> rankedWordsMap = new TreeMap<Integer, String>();

    //for each of the words unranked, find that word in the freqMap and add to rankedWords
    for (int i = 0; i < unrankedWords.size(); i++) {

        if (freqMap.containsKey((String) unrankedWords.get(i))) {

            rankedWordsMap.put(freqMap.get((String) unrankedWords.get(i)),
                    (String) unrankedWords.get(i));

        }

    }

    return rankedWordsMap;

}

Upvotes: 2

Views: 163

Answers (6)

A4L
A4L

Reputation: 17595

You could use the a Set as a value for your TreeMap so you can do the following to add words by frequency to your Map

TreeMap<Integer, Set<String>> rankedWordsMap = new TreeMap<>();

// inside loop
String word = (String) unrankedWords.get(i);
int frequency = freqMap.get(word);
// get the set of words with the same frequency
Set<String> wordSet = rankedWordsMap.get(frequency);
// if not yet existen, create and put it into the map
if(wordSet == null) {
    wordSet = new HashSet<>();
    rankedWordsMap.put(frequency, wordSet);
}
// add the word to set of words
wordSet.add(word);

This way you'll retain all words with the same frequency.

Upvotes: 0

splungebob
splungebob

Reputation: 5435

Not sure if this would be the most elegant solution, but once your frequency map is complete, you could turn each map entry into an Object that represent each map entry:

class Entry {
  String word;
  int frequency;
}

Then you would just write a comparator for that object's frequency/value for sorting.

Upvotes: 0

Joop Eggen
Joop Eggen

Reputation: 109613

Make a list of the entries and sort them by the entry values.

List<Map.Entry<String, Integer>> results = new ArrayList<>();
results.addAll(freqMap.entrySet());
Collections.sort(new Comparator<Map.Entry<String, Integer>() {
    @Override
    public int compare(Map.Entry<String, Integer> lhs,
            Map.Entry<String, Integer> rhs) {
        int cmp = lhs.getValue() - rhs.getValue();
        if (cmp == 0) {
            cmp = lhs.getKey().compareTo(rhs.getKey());
        }
        return cmp;
    }
});

Upvotes: 1

rolfl
rolfl

Reputation: 17707

Your process is somewhat broken. The contract for a TreeMap requires that the behaviour of the compareTo(...) call never changes for the life of the TreeMap. In other words, you cannot update the factors that change the sort order (like changing the frequency).

My suggestion is to do one of two things:

  • Use two phases, one to calculate word frequencies (keyed by the word), and the second phase sorts the words in to their frequency order
  • create custom data structures (perhaps two arrays) that manage the dynamic nature for you.

If performance is not critical, I would probably choose the first. otherwise, the second option looks like a nice challenge

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533820

I would start with a Map of String to Integer frequency.

Copy the entrySet() to a List and sort it by frequency.

Upvotes: 3

Adrian
Adrian

Reputation: 46572

You should re-think your data structure in order to have unique keys. It sounds like your structure is inverted: it should be a Map of words to counts, not the other way around, as the words are the unique key, and the counts are the value data associated with the keys.

Upvotes: 4

Related Questions