Reputation: 155
In a per-interview question i have been asked that "How would you sort a list of a billion students, based on their overall score on a test? The roll number of student’s moves from 1-1B and range of marks is 10-100." Although any sorting algorithm would do but what would be the efficient one?
Upvotes: 2
Views: 602
Reputation: 19
I will use some sort of divide and conquer algorithm (such as merge sort or quick sort or bucket sort) and use this idea to divide the sorting between several buckets. The problem arises when you need to merge all the data together back to a big array, but that only takes O(n) since the sub-arrays are already sorted.
bucket sort(L)
{
list Y[k+1]
for (i = 0; i <= k; i++) Y[i] = empty
while L nonempty
{
let X = first record in L
move X to Y[key(X)]
}
for (i = 0; i <= k; i++)
concatenate Y[i] onto end of L
}
There are two loops taking O(k) time, and one taking O(n), so the total time is O(n+k). This is good when k is smaller than n. E.g. suppose you want to sort 1,000,000,000 people by scores; n=1000000000, k=100-10, so time = O(n).
Upvotes: 0
Reputation: 711
Use count sort . It is good if you know the max value and some other parameters which are satisfied in this question. It sorts in O(n)
Upvotes: 0
Reputation: 22555
Simply run counting sort on input, it's O(n)
in this case, because range is bounded. Also it's most efficient way because any way to output all of the students takes Ω(n).
You can outputs student by looping on them for possible available scores (e.g if 90 possible scores exists, loop through students 90 times, in the first time output students with score 100, ....).
This task can be done by bucket sort. But first you should loop through the inputs, find each scores related student count, after that create a bucket for each score by considering its student count, then filling the bucket, note that you should create an array for bucket, also you should have an extra parameter which saves current item count in each bucket.
First approach (using counting sort directly) is O(n) with O(1) extra space, second approach is O(n) with O(n) extra spaces, but second one is faster, because it's 2*n
, and first one is 90*n
.
Upvotes: 6