agori
agori

Reputation: 481

Efficient way to group, sort and return first N results

I have a stream (or long list of element, could be thousands or millions), and I have to return the first N groups (24 in my situation) sorted by the average of the group. So items are in the form:

{groupId: 1, value: 10}, {groupId: 2, value: 4}, {groupId: 1: value: 2}

and form groups:

{groupId: 1, average: 6}, {groupId: 2: average}

Clearly the naive solution is to iterate, group, sort groups by average and return the first 24 groups. Any idea for an high performance solution that can deal with millions of items?

Upvotes: 1

Views: 941

Answers (2)

Rerito
Rerito

Reputation: 6086

You cannot escape iterating over the whole list to get each and every member of a given group. Once you have each group available with its mean, you can do the following:

  1. Take the N first groups into a vector/array.
  2. Make a heap out of that array such that the top of the heap is the group with the maximum average.
  3. For each remaining group, compare it with the top of the heap:
    • If the current group is greater than the top of the heap, discard it
    • If it is smaller, pop the top of the heap and insert the current group

At the end you have all the N first groups in a heap. You can get them in order by applying the last step of a heap sort and reverse the container you obtain (because the heap is a max-heap).

Overall complexity: (where K is the total number of groups and N defined above)

O(N + (K-N).ln(N) + N.ln(N) = O(N + K.ln(N))

  • The term N comes from taking the first N groups and making the initial max-heap.
  • The term (K-N).ln(N) comes from the (delete top / insert current group) pairs of operations (at most K - N of them).
  • The last term (N.ln(N)) is for the sorting of the final heap.

Upvotes: 1

MBo
MBo

Reputation: 80197

Just keep for every group two values - sum of values for that group and counter. In the end divide sum by counter to get average for this group.

You cannot keep info for limited number of groups because any group might become a leader at some moment.

Upvotes: 1

Related Questions