Reputation: 51
I would like to do the following array extension using SIMD intrinsic. I have two arrays:
cluster value (v_i): 10, 20, 30, 40
cluster length (l_i): 3, 2, 1, 2
I would like to create a resultant array containing the values: v_i repeated for l_i times, i.e:
result: 10, 10, 10, 20, 20, 30, 40, 40.
How can I compute this using SIMD intrinsic?
Upvotes: 2
Views: 363
Reputation: 24667
This may be optimized by SIMD if input array size is up to 8, output array size up to 16, and bytes as array values. At least SSSE3 is required. Extending this approach to larger arrays/elements is possible but efficiency will quickly degrade.
Main loop of the algorithm (binary search) contains PSHUFB, PCMPGTB, and a few arithmetical and logical operations. It is executed log(input_array_size)
times, most likely 2 or 3 times.
Here is an example:
cluster value: 10 20 30 40
cluster length: 3 2 1 2
prefix sum: 0 3 5 6 8
indexes: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
prefix value: 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
byte number: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
mask: ff ff ff ff ff 0 0 0 0 0 0 0 0 0 0 0
indexes: 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3
prefix value: 3 3 3 3 3 6 6 6 6 6 6 6 6 6 6 6
byte number: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
mask: ff ff ff 0 0 ff 0 0 0 0 0 0 0 0 0 0
indexes: 0 0 0 1 1 2 3 3 3 3 3 3 3 3 3 3
length constrained: 0 0 0 1 1 2 3 3 ff ff ff ff ff ff ff ff
cluster value: 10 10 10 20 20 30 40 40 0 0 0 0 0 0 0 0
Upvotes: 2