Fast Interleaving of Data

Question

I'm working with some piece of hardware (the hardware itself is not important) and I need to split some block data intro separate pieces in order to make the thing run faster.

So I have, for example a contiguous block of memory X words long. For visualization, I'm arranging it into 50 word lines below:

001  002  003  004  005 006 007 ...
051  052  053  054  055 056 057 ...
101  102  103  104  105 106 107 ...
151  152  153  154  155 156 157 ...

I need a fast way of splitting these into four separate blocks:

Block1

001  003  005 007 ...
101  103  105 107 ...

Block2

002  004  006 ...
102  104  106 ...

Block3

051  053  055 057 ...
151  153  155 157 ...

Block4

052  054  056 ...
152  154  156 ...

Or, basically:

Block1   Block2   Block1   Block2 ...
Block3   Block4   Block3   Block4 ...
Block1   Block2   Block1   Block2 ...
Block3   Block4   Block3   Block4 ...

Now doing this is as simple as using for-loops. But what is a more optimized/parallel way of doing this? (No MPI stuff, this happens on an app running on the desktop).

So summing it up, just to be clear:

I have data as shown above.
I'm sending this data several devices (outside the PC). This data needs to be sent down the wire as 4 separate blocks (to the separate devices).

MSalters · Accepted Answer

This is a prime example where SSE can help you. It's very good at data shuffling as well as streaming data from memory and back. On some non-x86 architectures, there are similar ISA extensions available (e.g. AltiVec)

Fast Interleaving of Data

Answers (2)

Related Questions