Tegiri Nenashi
Tegiri Nenashi

Reputation: 3086

Representing binary relation in java

One famous programmer said "why anybody need DB, just give me hash table!". I have list of grammar symbols together with their frequencies. One way it's a map: symbol#->frequency. The other way its a [binary] relation. Problem: get top 5 symbols by frequency.

More general question. I'm aware of [binary] relation algebra slowly making inroad into CS theory. Is there java library supporting relations?

Upvotes: 1

Views: 858

Answers (3)

John B
John B

Reputation: 32969

 List<Entry<String, Integer>> myList = new ArrayList<...>();
 for (Entry<String, Integer> e : myMap.entrySet())
       myList.add(e);

 Collections.sort(myList, new Comparator<Entry<String, Integer>>(){

    int compare(Entry a, Entry b){
       // compare b to a to get reverse order
       return new Integer(b.getValue()).compareTo(new Integer(a.getValue());
    }
 });

 List<Entry<String, Integer>> top5 = myList.sublist(0, 5);

More efficient:

 TreeSet<Entry<String, Integer>> myTree = new TreeSet<...>(
    new  Comparator<Entry<String, Integer>>(){

      int compare(Entry a, Entry b){
         // compare b to a to get reverse order
         return new Integer(b.getValue()).compareTo(new Integer(a.getValue());
      }
    });
 for (Entry<String, Integer> e : myMap.entrySet())
       myList.add(e);

 List<Entry<String, Integer>> top5 = new ArrayList<>();
 int i=0;
 for (Entry<String, Integer> e : myTree) {
     top5.add(e);
     if (i++ == 4) break;
 }

Upvotes: 1

sampson-chen
sampson-chen

Reputation: 47317

Here is a general algorithm, assuming you already have a completed symbol HashTable

  1. Make 2 arrays:
    • freq[5] // Use this to save the frequency counts for the 5 most frequent seen so far
    • word[5] // Use this to save the words that correspond to the above array, seen so far
  2. Use an iterator to traverse your HashTable or Map:
    • Compare the current symbol's frequency against the ones in freq[5] in sequential order.
    • If the current symbol has a higher frequency than any entry in the array pairing above, shift that entry and all entries below it one position (i.e. the 5th position gets kicked out)
    • Add the current symbol / frequency pair to the newly vacated position
    • Otherwise, ignore.

Analysis:

  • You make at most 5 comparisons (constant time) against the arrays with each symbol seen in the HashTable, so this is O(n)
  • Each time you have to shift the entries in the array down, it is also constant time. Assuming you do a shift every time, this is still O(n)

Space: O(1) to store the arrays

Runtime: O(n) to iterate through all the symbols

Upvotes: 0

zch
zch

Reputation: 15278

With TreeSet it should be easy:

int i = 0;
for(Symbol s: symbolTree.descendingSet()) {
    i++;
    if(i > 5) break; // or probably return
    whatever(s);
}

Upvotes: 0

Related Questions