Reputation: 1185
I have a JS application that needs to do a complicated sort of a large array and then display it. Using the built in array.sort(cb)
method can take up to 1 second with my data. This is long enough for my UI to get janky.
Because the UI is only tall enough to show a subset of the sorted array on the screen with the rest below the scroll or paginated, I had an idea. What if I made an algorithm that went through the large array and quickly did a sort in such a way that the top N items were perfectly sorted, but the remaining items in the array were imperfectly sorted. Each time I ran my algorithm it would sort a little more of the array from the top down.
So I could break up my processing into chunks and have a smooth UI. For the first few seconds the array would not be perfectly sorted, but the imperfections would be below the scroll so they wouldn't be noticed.
My naive solution would be to write my own "Selection Sort" with the ability to break after N matches and resume later, but "Selection Sort" is a pretty terrible algorithm. The faster algorithms (from my understanding) have to go to completion to guarantee that the top N items are stable.
Does anyone know of an existing solution for this? Am I crazy? Any suggestions?
UPDATE
Taking the idea suggested by @moreON, I wrote a custom QuickSort that bails out once it has the required precision. The native sort took 1sec for this data. The regular QuickSort took around 250ms, which is already surprisingly better. The QuickSort that bails out after the first 100 items are sorted took a brisk 10ms, which is much much better. I can then take an additional 250ms to finish the sort but this doesn't matter so much because the user is already looking at the data. This reduces the user's experienced delay from 1sec to 10ms, which is pretty great.
//Init 1 million random integers into array
var arr1 = [];
var arr2 = [];
for(var i=0;i<1800000;i++) {
var num = Math.floor(Math.random() * 1000000);
arr1.push(num);
arr2.push(num);
}
console.log(arr1);
//native sort
console.time("native sort");
arr1.sort(function(a,b) { return a-b; });
console.timeEnd("native sort"); //1sec
console.log(arr1);
//quicksort sort Ref: https://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/
function swap(arr, a, b) {
var temp = arr[a];
arr[a] = arr[b];
arr[b] = temp;
}
function cmp(a,b) {
return (a<b);
}
function partition(items, left, right) {
var pivot = items[Math.floor((right + left) / 2)];
var i = left;
var j = right;
while (i <= j) {
while (cmp(items[i],pivot)) i++;
while (cmp(pivot,items[j])) j--;
if (i <= j) {
swap(items, i, j);
i++;
j--;
}
}
return i;
}
function quickSort(items, left, right, max) {
if(max && left-1 > max) return items; //bail out early if we have enough
if (items.length > 1) {
var index = partition(items, left, right);
if (left < index - 1) quickSort(items, left, index - 1, max);
if (index < right) quickSort(items, index, right, max);
}
return items;
}
//sort first 100
console.time("partial Quicksort");
arr2 = quickSort(arr2,0,arr2.length-1,100);
console.timeEnd("partial Quicksort"); //10ms
console.log(arr2);
//sort remainder
console.time("finishing Quicksort");
arr2 = quickSort(arr2,100,arr2.length-1); //250ms
console.timeEnd("finishing Quicksort");
console.log(arr2);
Upvotes: 6
Views: 2599
Reputation: 1185
Here is a cleaned up version of my solution that sorts a large array in batches so the JS thread doesn't stutter. In my example here, it takes a 1 second array.sort(cb)
and turns it into five separate 100ms operations. You'll want to pick the pageSize intelligently based on your data. More pages will make the final sort take longer, fewer pages will make the batches take longer.
var BatchedQuickSort = {
swap: function(arr, a, b) {
var temp = arr[a];
arr[a] = arr[b];
arr[b] = temp;
},
partition: function(items, left, right, cmp) {
var pivot = items[Math.floor((right + left) / 2)];
var i = left;
var j = right;
while (i <= j) {
while (cmp(items[i],pivot)<0) i++;
while (cmp(pivot,items[j])<0) j--;
if (i <= j) {
this.swap(items, i, j);
i++;
j--;
}
}
return i;
},
sort: function(items, cmp, max, left, right) { //Ref: https://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/
if (items.length > 1) {
left = typeof left != "number" ? 0 : left;
right = typeof right != "number" ? items.length - 1 : right;
var index = this.partition(items, left, right, cmp);
if (left < index - 1) this.sort(items, cmp, max, left, index - 1);
if (index < right && (!max || index<=max)) this.sort(items, cmp, max, index, right);
}
return items;
}
}
//Example Usage
var arr = [];
for(var i=0;i<2000000;i++) arr.push(Math.floor(Math.random() * 1000000));
function myCompare(a,b) { return a-b; }
var pageSize = Math.floor(arr.length/5);
var page = 1;
var timer = window.setInterval(function() {
arr = BatchedQuickSort.sort(arr, myCompare, pageSize*page,pageSize*(page-1));
if(page*pageSize>=arr.length) {
clearInterval(timer);
console.log("Done",arr);
}
page++;
},1);
Upvotes: 1
Reputation: 2739
I think your question boils down to:
How to find top N elements in large array
which is kindof answered here: Find top N elements in an Array
This can be solved by traversing the list once and just pick the top N elements. Θ(n).
Check it out here: https://jsfiddle.net/jeeh4a8p/1/
function get_top_10(list) {
var top10 = new Array(10).fill(0)
for(i in list) {
var smallest_in_top10 = Math.min.apply( Math, top10 )
if(list[i] > smallest_in_top10) {
top10.splice(top10.indexOf(smallest_in_top10),1)
top10.push(list[i])
}
}
return top10
}
console.log(get_top_10([1,2,3,4,5,6,7,8,9,10,11,12]))
var random_list = [];
for (var i = 0; i < 100; i++) {
random_list.push(Math.round(Math.random() * 999999))
}
console.log(get_top_10(random_list))
function sortNumber(a,b) {
return a - b;
}
Upvotes: 0
Reputation: 23955
If you were to heapify array
, which I believe can be done in O(n)
time (https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap), you could extract each N
items, in order, in O(N log n)
time (n
getting smaller as you extract).
Upvotes: 2
Reputation: 14964
First of all, have some perspective on the performance improvement expectations. Efficient sorting algorithms are O(N * log2(N)). For N=1,000,000 items, N * log2(N) ~ N * 20. I doubt you have that many items that you're trying to render in a webpage.
If you only need to render the first 25 rows, Selection Sort will take N * 25 to order them, so it'll actually perform worse, assuming comparable constant overhead.
If you do want to experiment with this further, one algorithm I can think of is this: maintain a binary tree of PAGE_SIZE smallest items. Keep updating it with a single pass over the data, removing the largest items when smaller ones are found. Ignoring rebalancing, it'll take you N * log2(PAGE_SIZE) to populate the tree and render your first page of results.
Upvotes: -1