Reputation: 647
I have a two-dimesional integer array InArray[2][60]
carrying short
data in 2 LS bytes and bit field data in 2 MS bytes. Please suggest a faster method to extract short
data and copy it to a short
OutArray[60]
, something on the lines for memcpy()
. I presume iterating through each item is not the most optimal method of doing this.
TIA
EDIT : Adding code snippet
int InArray[2][60];
short OutArray[60];
for (int i=0; i < 60;i++)
{
OutArray[i] = (short)(InArray[0][i] & 0xffff);
}
Is there a better and possibly faster way of doing this
Upvotes: 0
Views: 178
Reputation:
This is only going to help if you're doing something like this many times. I used Agner Fog's vectorclass to do this (http://www.agner.org/optimize/vectorclass.zip). This is a class to use SSE/AVX. But you'll find the best answer if you add the tags SSE and AVX to your question.
You'll also get better results if you can insure the arrays are 16 byte or 32 byte aligned. In the code below it would also help to make either the width of the arrays equal to 64 (even if you are only going to use 60 elements) or to make the length of the array a multiple of 64.
#include <stdio.h>
#include "vectorclass.h"
void foo(int InArray[2][60], short OutArray[60]) {
for (int i=0; i < 60; i++) {
OutArray[i] = (short)(InArray[0][i] & 0xffff);
}
}
void foo_vec8s(int InArray[2][60], short OutArray[60]) {
int i=0;
for (; i <(60-8); i+=8) {
Vec8s v1 = Vec8s().load(&InArray[0][i]);
Vec8s v2 = Vec8s().load(&InArray[0][i+4]);
Vec8s out = blend8s<0,2,4,6,8,10,12,14>(v1,v2);
out.store(&OutArray[i]);
}
//clean up since arrays are not a multiple of 64
for (;i < 60; i++) {
OutArray[i] = (short)(InArray[0][i] & 0xffff);
}
}
int main() {
int InArray[2][60];
for(int i=0; i<60; i++) {
InArray[0][i] = i | 0xffff0000;
}
short OutArray1[60] = {0};
foo(InArray, OutArray1);
for(int i=0; i<60; i++) {
printf("%d ", OutArray1[i]);
} printf("\n");
short OutArray2[60] = {0};
foo_vec8s(InArray, OutArray2);
for(int i=0; i<60; i++) {
printf("%d ", OutArray2[i]);
} printf("\n");
}
Upvotes: 1
Reputation: 2576
If you really are copying a 60-element array, then it does not matter.
If the array is larger and/or you are doing it a lot of times, then you'll want to have a look at SIMD instruction sets: SSEx on Intel platforms, Altivec on PPC...
For instance, using SSE4, you may use _mm_packus_epi32() which packs (and saturates) 2*4 32-bit operands into 8 16-bit operands.
Your compiler probably has intrinsics to use those: http://msdn.microsoft.com/en-us/library/hh977022.aspx, http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/PowerPC-AltiVec-Built_002din-Functions.html...
Upvotes: 2