Reputation: 1734
I was working with long int data and I was trying to determine the smallest element in an array. I know the traditional way of looping through the array to find the minimum. This question is to check if there are other ways to speed it up.
There are some properties of this array which could probably help us speed up things, but I am not sure how.
The array has exactly 8 long int integers. Everytime we call the function, we find a min from the array and the number is replaced by another number and we repeat this step. (at least 8 billion times)
I was thinking of remembering the second largest number somehow for next iteration (Since we will have compared them in the current iteration). Would this be useful compared to the linear implementation of going through the array?
Also sorting is allowed, but we have to somehow remember the original positions using a temporary array. Would this be more effective.
Also is it somehow possible to use SIMD to determine minimum on long ints? Even a millisecond speedup is useful as I am doing this operation billions of times.
Upvotes: 0
Views: 880
Reputation:
It will be hard to get significant speedup on this operation given that N is so small, and the replacement process is inherently sequential. Though in theory a min-heap is the perfect tool, I wouldn't bet on it because of the overhead.
My suggestion would be to keep the array in increasing order and when replacing the minimum to use the insertion step of InsertionSort, i.e. shift the elements to front one by one until the insertion slot is found. You can unroll the code completely to avoid checking the end-of-array condition.
The benefit of keeping the elements sorted is that once you have found the insertion point you can stop searching. On average, you can expect an improvement on the side of the number of comparisons (but an increase in the number of memory moves :-( )
You can also think of a binary search to find the insertion point, taking 3 or 4 comparisons, but I doubt it will clearly beat a linear search.
If your values fit in a 16 bits unsigned integer, you'll be very happy with the _mm_minpos_epu16
instruction.
In a completely paranoid version, you can avoid the unwanted memory moves by numbering the permutations that turn the raw array to a sorted sequence. There are 40320 of them in total (!). Arrange a gigantic hard-coded switch statement in which the linear search is performed in the relevant order given the permutation at hand; then replace the maximum and update the permutation index.
Upvotes: 0
Reputation: 24417
It's possible to do this using SIMD, as you can parallelize up to 4 of the comparisons. The normal algorithm of looping through the array can't be vectorized because each comparison depends on the result of the comparison before it, e.g.
x = min(array[0], array[1])
x = min(x, array[2])
x = min(x, array[3))
...
If you change this to a kind of knock-out tournament approach, you can do several comparisons at once if you load values 0-3 into one vector and values 4-7 into another:
// these 4 ops can be done at once using SIMD
x[0] = min(array[0], array[4])
x[1] = min(array[1], array[5])
x[2] = min(array[2], array[6])
x[3] = min(array[3], array[7])
// so can these 2 ops:
y[0] = min(x[0], x[2])
y[1] = min(x[1], x[3])
z[0] = min(y[0], y[1])
This means that in theory only 3 vectorized comparisons need to be done.
In ARM NEON SIMD, for example, it would look something like this (comparing 8 32 bit values):
vldm r1!, {d0-d3}
vmin.32 q0, q0, q1 // first vectorized comparison
vpmin.32 d0, d0, d1 // second comparison
vpmin.32 d0, d0, d1 // third comparison
// min value is now in d0[0]
In the last comparison you end up doing extra comparisons that you don't need to because it's vectorized, but it doesn't matter.
I've used ARM NEON as an example because I'm not really familiar with x86 SIMD, but the same approach should work and could be extended to 64-bit values, as in this related question
As always, make sure you profile, don't optimize prematurely, yadda yadda yadda
Upvotes: 1
Reputation: 52538
I'd keep a bit of information around and update it.
You have eight values x0 to x7.
Keep values a0 = max (x0, x1), a2 = max (x2, x3), a4 = max (x4, x5), a6 = max (x6, x7), plus remember which one was the largest of each pair.
Keep values b0 = max (a0, a2), b4 = max (a4, a6), and remember which one was the largest of each set.
Now getting the largest element is trivial. When you have it, and insert a new element, you need to update exactly one of the values a0, a2, a4 and a6, and exactly one of b0 and b4.
(Just noticed you were looking for the minimum - shouldn't make much difference).
Upvotes: 0
Reputation: 12641
Try using min-heap. For example
#include <iostream>
#include <algorithm>
#include <array>
using namespace std;
int main() {
array<int, 8> arr { 3, 1, 4, 6, 5, 9, 2, 7 };
make_heap(arr.begin(), arr.end(), greater<int>());
pop_heap(arr.begin(), arr.end());
cout << "Min Element: " << arr.back() << endl;
return 0;
}
Output
1
The naïve way here would be
*min_element(arr.begin(), arr.end());
or probably you could use a multiset
std::multiset<long int> ms { 3, 1, 4, 6, 5, 8, 2, 7 };
for every new_element
ms.erase(ms.begin()); // ms.begin() is the iterator to min element
ms.insert(new_element);
Upvotes: 0
Reputation: 9772
Because it is just eight integers proceed as follows:
Upvotes: 0
Reputation: 76240
The theoretical complexity of an algorithm with an 8 elements array is pretty much irrelevant. Searching linearly is very likely your best option, given cache locality and all.
Another option would be to sort the array in decreasing order once, and then simply replace the first element every time, and eventually shift the new number on the right.
In any case, try and profile.
Upvotes: 5
Reputation: 4343
You could organise the array in the form of a min-heap. Searching will be O(1)
, and replacing will be O(logn)
. This will better the time complexity from O(n)
to O(logn)
, which should be significant.
Upvotes: 0