Reputation: 34564
I have an array of structs
called struct Test testArray[25]
.
The Test struct
contains a member called int size
.
What is the fastest way to get another array of Test structs
that contain all from the original excluding the 5 largest, based on the member size
? WITHOUT modifying the original array.
NOTE: Amount of items in the array can be much larger, was just using this for testing and the values could be dynamic. Just wanted a slower subset for testing.
I was thinking of making a copy of the original testArray
and then sorting that array. Then return an array of Test structs
that did not contain the top 5 or bottom 5 (depending on asc or desc).
OR
Iterating through the testArray
looking for the largest 5 and then making a copy of the original array excluding the largest 5. This way seems like it would iterate through the array too many times comparing to the array of 5 largest that had been found.
Follow up question:
Here is what i am doing now, let me know what you think?
Considering the number of largest elements i am interested in is going to remain the same, i am iterating through the array and getting the largest element and swapping it to the front of the array. Then i skip the first element and look for the largest after that and swap it into the second index... so on so forth. Until i have the first 5 largest. Then i stop sorting and just copy the sixth index to the end into a new array.
This way, no matter what, i only iterate through the array 5 times. And i do not have to sort the whole thing.
Upvotes: 1
Views: 1268
Reputation: 5325
Partial Sorting with a linear time selection algorithm will do this in O(n) time, where sorting would be O(nlogn).
To quote the Partial Sorting page:
The linear-time selection algorithm described above can be used to find the k smallest or the k largest elements in worst-case linear time O(n). To find the k smallest elements, find the kth smallest element using the linear-time median-of-medians selection algorithm. After that, partition the array with the kth smallest element as pivot. The k smallest elements will be the first k elements.
You can find the k largest items in O(n), although making a copy of the array or an array of pointers to each element (smarter) will cost you some time as well, but you have to do that regardless.
If you'd like me to give a complete explanation of the algorithm involved, just comment.
Update: Regarding your follow up question, which basically suggests iterating over the list five times... that will work. But it iterates over the list more times than you need to. Finding the k largest elements in one pass (using an O(n) selection algorithm) is much better than that. That way you iterate once to make your new array, and once more to do the selection (if you use median-of-medians, you will not need to iterate a third time to remove the five largest items as you can just split the working array into two parts based on where the 5th largest item is), rather than iterating once to make your new array and then an additional five times.
Upvotes: 3
Reputation: 5692
With usage of min-heap
data structure and set heap size to 5
, you can traverse the array and insert into heap when the minimum element of heap is less than the element in the array.
getMin
takes O(1)
time and insertion takes O(log(k))
time where k
is the element size of heap (in our case it is 5
). So in the worst case we have complexity O(n*log(k))
to find max 5 elements. Another O(n)
will take to get the excluded list.
Upvotes: 0
Reputation: 44
Maybe an array isn't the best structure for what you want. Specially since you need to sort it every time a new value is added. Maybe a linked list is better, with a sort on insert (which is O(N) on the worst case and O(1) in the best), then just discard the last five elements. Also, you have to consider that just switching a pointer is considerably faster than reallocating the entire array just get another element in there.
Why not an AVL Tree? Traverse time is O(log2N), but you have to consider the time of rebalancing the tree, and if the time spent coding that is worth it.
Upvotes: 0
Reputation: 9007
As stated sorting is O(nlogn +5) iterating in O(5n + 5). In the general case finding m largest numbers is O(nlog +m) using the sort algorithm and O(mn +m) in the iteration algoritm. The question of which algorithm is better depends on the values of m and n. For a value of five iterating is better for up to 2 to the 5th numbers I.e. a measly 32. However in terms of operations sorting is more intensive than iterating so it'll be quite a bit more until it is faster.
You can do better theoretically by using a sorted srray of the largest numbers so far and binary search to maintain the order that will give you O(nlogm) but that again depends on the values of n and m.
Upvotes: 0