Reputation: 481
I have a stream (or long list of element, could be thousands or millions), and I have to return the first N groups (24 in my situation) sorted by the average of the group. So items are in the form:
{groupId: 1, value: 10}, {groupId: 2, value: 4}, {groupId: 1: value: 2}
and form groups:
{groupId: 1, average: 6}, {groupId: 2: average}
Clearly the naive solution is to iterate, group, sort groups by average and return the first 24 groups. Any idea for an high performance solution that can deal with millions of items?
Upvotes: 1
Views: 941
Reputation: 6086
You cannot escape iterating over the whole list to get each and every member of a given group. Once you have each group available with its mean, you can do the following:
At the end you have all the N first groups in a heap. You can get them in order by applying the last step of a heap sort and reverse the container you obtain (because the heap is a max-heap).
Overall complexity: (where K is the total number of groups and N defined above)
O(N + (K-N).ln(N) + N.ln(N) = O(N + K.ln(N))
Upvotes: 1
Reputation: 80197
Just keep for every group two values - sum of values for that group and counter. In the end divide sum by counter to get average for this group.
You cannot keep info for limited number of groups because any group might become a leader at some moment.
Upvotes: 1