Reputation: 1518
Working on the following algorithm:
Given a non-empty array of integers, return the k most frequent elements.
For example, Given [1,1,1,2,2,3] and k = 2, return [1,2].
Note: You may assume k is always valid, 1 ≤ k ≤ number of unique elements. Your algorithm's time complexity must be better than O(n log n), where n is the array's size.
My initial impulse is to use a hash table for number as the key and the value as the number of occurrences. Then, I can insert each key-value pair as a node into a maxHeap and simply delete max until k == 0.
Is building a node and inputting into a maxHeap the right way to solve with that approach? Note I'm not curious about a more optimal solution-- curious about whether that's the way I'd implement the idea of using maxHeap to repeatedly find the number with the maximum number of occurrences. The node part seems like overkill but I'm not sure how else to do it.
Upvotes: 5
Views: 5635
Reputation: 913
Answer:
code:
input = [1,1,1,2,2,3]
k = 2
def most_frequent(arr, num = 1)
arr.group_by(&:itself).sort_by {|_,s| -s.length}.first(num).map(&:first)
end
most_frequent(input, k)
output:
=> [1, 2]
MaxHeap Answer
require "rubygems"
require "algorithms"
include Containers
input = [1,1,1,2,2,3]
k = 2
def heap_most_frequent(arr, num = 1)
max_heap = MaxHeap.new(arr.group_by(&:itself).map{|n,ns| [ns.count,n]})
(1..num).each_with_object([]) { |i, result| result << max_heap.pop[1]}
end
Benchmark:
user system total real
orig: 0.050000 0.000000 0.050000 (0.057158)
heap: 0.110000 0.000000 0.110000 (0.112387)
Summary
Most of the work goes in to making the hash, the Heap in this instance just complicates things when dealing with key, value pairs.
Upvotes: 5
Reputation: 211560
You can always do this with a few O(n) transforms with one hash-table intermediate:
def max_in_list(list, n = 1)
list.group_by { |v| v }.sort_by do |_, s|
-s.length
end.first(n).map(&:first)
end
numbers = [ 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6 ]
max_in_list(numbers, 2)
# => [1, 2, 4]
Unfortunately max_by
doesn't respect ordering when requesting more than one entry. It just gives the top N without any concern for rank.
Upvotes: 2