Lokn
Lokn

Reputation: 431

Which datastructure to use?

I have a scenario like this, I need to store the count of the strings and I need to return the top ten strings which has the max count,

for example,

String    Count
---------------------------------
String1   10
String2   9
String3   8 
.
.
.
String10  1 

I was thinking of using the hashtable for storing the string and its count but it would be difficult to retrieve the top ten strings from it as I have to again loop through it to find them.

Any other suggestions here?

Upvotes: 1

Views: 156

Answers (6)

Ambar
Ambar

Reputation: 132

You need a Max Heap data structure for this. Put it all into the max heap, and make successive 10 (or whatever n) removals.

And if you intend to keep reusing the data once it's loaded into memory, it might be worth the expense of sorting by value instead of a heap.

Upvotes: 0

Andreas Dolk
Andreas Dolk

Reputation: 114757

Simply use a sorted map like

Map<Integer, List<String>> strings

where the key is the frequency value and the value the list of strings that occur with that frequency.

Then, loop through the map and, with an inner loop through the value lists until you've seen 10 strings. Those are the 10 most frequent one's.


With the additional requirement that the algorithm should support updates of frequencies: add the strings to a map like Map<String, Integer> where the key is the string and the value the actual frequency (increment the value if you see a string again). After that copy the key/value pairs to the map I suggested above.

Upvotes: 3

Matt
Matt

Reputation: 11805

Guava has a HashMultiset which would be very useful for this.

HashMultiset<String> ms = Hashmultiset.create();
ms.add(astring);
ms.add(astring, times);


ImmutableMultiset<String> ims = Multisets.copyHighestCountFirst(ms);

// iterator through the first 10 elements, and they will be your top 10
// from highest to lowest.

Upvotes: 0

Andrey Borisov
Andrey Borisov

Reputation: 3170

I am not sure, but i guess that the most suitable and elegant class for your needs is guava's http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/TreeMultiset.html

Upvotes: 0

ffriend
ffriend

Reputation: 28482

For any task like "find N top items" priority queues are perfect solution. See Java's PriorityQueue class.

Upvotes: 0

Brinnis
Brinnis

Reputation: 906

Priority Que.

You can make a class to put in it:

public class StringHolder{
    private String string;
    private int value;

    //Compare to and equals methods
}

then it will sort when you insert and it is easy to get the top 10.

Upvotes: 4

Related Questions