Reputation: 4295
I would need to sort a list of words based on its frequency.
My input:
Haha, hehe, haha, haha, hehe, hehe.... , Test
For example in my data structure I would have
Haha:3
Hehe:5
Test:10
I would need the data structure to be sorted at the output in this manner:
Test:10
Hehe:5
Haha:3
Such that if I pop the top of the data structure I would be able to obtain the element and its corresponding frequency.
The number of elements is unknown initially and hence, an array would not be feasible. If I would like to obtain the top few elements I would just need to access it sequentially. Is this possible in Java?
Upvotes: 0
Views: 2703
Reputation: 2804
I am starting with the URL below as reference and I will be building on that reference:
How can I count the occurrences of a list item in Python?
Now, the building starts:
>>> from collections import Counter
>>> word_list = ['blue', 'red', 'blue', 'yellow', 'blue', 'red','white','white']
>>> Counter(word_list)
Counter({'blue': 3, 'red': 2, 'white': 2, 'yellow': 1})
Note how Counter(word_list) displays the list of elements i.e. word/frequency pairs sorted in order of decreasing frequency for you. Unfortunately, extracting the words and compiling them in a list sorted in the same order takes a little more work:
(1) Get "size" as the number of elements in the JSON object.
(2) Apply the "most_common" method on the JSON object to get a sorted array of the elements by frequency.
(3) Apply a list comprehension to generate the list of the words extracted from the sorted array.
>>> size = len(Counter(word_list))
4
>>> word_frequency_pairs = Counter(word_list).most_common(size)
>>> word_frequency_pairs
[('blue', 3), ('white', 2), ('red', 2), ('yellow', 1)]
>>> [i[0] for i in word_frequency_pairs]
['blue', 'white', 'red', 'yellow']
There is a reason why I love Python :)
Upvotes: 1
Reputation: 3020
First, want to confirm: Can you get all the whole words before sorting? Or these words come continuously in a stream?
(1)For the former case, you can use a Set
to store the words, then put them into a PriorityQueue
. If you implement the comparator function, the queue will sort the words automatically. I create a new class Pair
to store the text and frequency, see the code:
import java.util.Queue;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.HashSet;
import java.util.Comparator;
public class PriorityQueueTest {
public static class Pair {
private String text;
private int frequency;
@Override
public int hashCode() {
return text.hashCode();
}
@Override
public String toString() {
return text + ":" + frequency;
}
public Pair(String text, int frequency) {
super();
this.text = text;
this.frequency = frequency;
}
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}
public static Comparator<Pair> idComparator = new Comparator<Pair>(){
@Override
public int compare(Pair o1, Pair o2) {
if(o1.getFrequency() > o2.getFrequency()) {
return -1;
}
else if(o1.getFrequency() < o2.getFrequency()){
return 1;
}
else {
return 0;
}
}
};
public static void main(String[] args) {
Set<Pair> data = new HashSet<Pair>();
data.add(new Pair("haha", 3));
data.add(new Pair("Hehe", 5));
data.add(new Pair("Test", 10));
Queue<Pair> queue = new PriorityQueue(16, idComparator);
for(Pair pair : data) {
queue.add(pair);
}
// Test the order
Pair temp = null;
while((temp = queue.poll()) != null) {
System.out.println(temp);
}
}
}
(2)For the other case(the words come continuously), you may use a TreeMap
to keep the order.
See ref: http://www.java-samples.com/showtutorial.php?tutorialid=370
Upvotes: 2
Reputation: 9930
To keep the information you need, you could create a class that holds your string and the count (e.g. Pair
) and keep the instances of this class in a List<Pair>
. This approach would make the increment of the count for a given string inefficient since you would have to look for the element that holds the string in linear time (O(N)) and then increment it.
A better approach is to use a Map<String, Integer>
, that way the search is done in constant time (O(1)) and then you can sort the elements in the Set<Map.Entry<String, Integer>>
returned by Map.entrySet()
.
Upvotes: 1