aceminer
aceminer

Reputation: 4295

Sorting a list based on frequency of words

I would need to sort a list of words based on its frequency.

My input:

Haha, hehe, haha, haha, hehe, hehe.... , Test

For example in my data structure I would have

Haha:3
Hehe:5
Test:10 

I would need the data structure to be sorted at the output in this manner:

Test:10
Hehe:5
Haha:3

Such that if I pop the top of the data structure I would be able to obtain the element and its corresponding frequency.

The number of elements is unknown initially and hence, an array would not be feasible. If I would like to obtain the top few elements I would just need to access it sequentially. Is this possible in Java?

Upvotes: 0

Views: 2703

Answers (3)

Vietnhi Phuvan
Vietnhi Phuvan

Reputation: 2804

  1. List item

I am starting with the URL below as reference and I will be building on that reference:

How can I count the occurrences of a list item in Python?

Now, the building starts:

>>> from collections import Counter
>>> word_list = ['blue', 'red', 'blue', 'yellow', 'blue', 'red','white','white']
>>> Counter(word_list)
Counter({'blue': 3, 'red': 2, 'white': 2, 'yellow': 1})

Note how Counter(word_list) displays the list of elements i.e. word/frequency pairs sorted in order of decreasing frequency for you. Unfortunately, extracting the words and compiling them in a list sorted in the same order takes a little more work:

(1) Get "size" as the number of elements in the JSON object.

(2) Apply the "most_common" method on the JSON object to get a sorted array of the elements by frequency.

(3) Apply a list comprehension to generate the list of the words extracted from the sorted array.

>>> size = len(Counter(word_list))
4
>>> word_frequency_pairs = Counter(word_list).most_common(size)
>>> word_frequency_pairs
[('blue', 3), ('white', 2), ('red', 2), ('yellow', 1)]
>>> [i[0] for i in word_frequency_pairs]
['blue', 'white', 'red', 'yellow']

There is a reason why I love Python :)

Upvotes: 1

yellowB
yellowB

Reputation: 3020

First, want to confirm: Can you get all the whole words before sorting? Or these words come continuously in a stream?

(1)For the former case, you can use a Set to store the words, then put them into a PriorityQueue. If you implement the comparator function, the queue will sort the words automatically. I create a new class Pair to store the text and frequency, see the code:

import java.util.Queue;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.HashSet;
import java.util.Comparator;

public class PriorityQueueTest {

    public static class Pair {
        private String text;
        private int frequency;

        @Override
        public int hashCode() {
            return text.hashCode();
        }

        @Override
        public String toString() {
            return text + ":" + frequency;
        }

        public Pair(String text, int frequency) {
            super();
            this.text = text;
            this.frequency = frequency;
        }

        public String getText() {
            return text;
        }
        public void setText(String text) {
            this.text = text;
        }
        public int getFrequency() {
            return frequency;
        }
        public void setFrequency(int frequency) {
            this.frequency = frequency;
        }
    }

    public static Comparator<Pair> idComparator = new Comparator<Pair>(){
        @Override
        public int compare(Pair o1, Pair o2) {
            if(o1.getFrequency() > o2.getFrequency()) {
                return -1;
            }
            else if(o1.getFrequency() < o2.getFrequency()){
                return 1;
            }
            else {
                return 0;
            }
        }
    };

    public static void main(String[] args) {
        Set<Pair> data = new HashSet<Pair>();
        data.add(new Pair("haha", 3));
        data.add(new Pair("Hehe", 5));
        data.add(new Pair("Test", 10));

        Queue<Pair> queue = new PriorityQueue(16, idComparator);

        for(Pair pair : data) {
            queue.add(pair);
        }

        // Test the order
        Pair temp = null;
        while((temp = queue.poll()) != null) {
            System.out.println(temp);
        }

    }

}

(2)For the other case(the words come continuously), you may use a TreeMap to keep the order. See ref: http://www.java-samples.com/showtutorial.php?tutorialid=370

Upvotes: 2

juan.facorro
juan.facorro

Reputation: 9930

To keep the information you need, you could create a class that holds your string and the count (e.g. Pair) and keep the instances of this class in a List<Pair>. This approach would make the increment of the count for a given string inefficient since you would have to look for the element that holds the string in linear time (O(N)) and then increment it.

A better approach is to use a Map<String, Integer>, that way the search is done in constant time (O(1)) and then you can sort the elements in the Set<Map.Entry<String, Integer>> returned by Map.entrySet().

Upvotes: 1

Related Questions