frankiek3
frankiek3

Reputation: 88

Sorting/reordering dependent instructions for dual issue processing

I attempted to write a sort algorithm to reorder instructions for a dual issue processor (Cell SPU). One way to obtain dual issue processing an instruction should not depend on the instruction that precedes it (another involves separate pipelines, but I'm focused on instructions in the same pipeline). I understand this would be too much for the compiler to do and I didn't find what I needed when searching. This can be done by hand in most cases but a sorting algorithm should ensure the lowest "sequence count" (number or dependent instructions that follow each-other).

My question is has this or something similar been done before? Is there an optimized approach?

Simple example pseudo-code halving instruction time (inputs: i1, i2, i3):

v1 = i1 ^ i2; - #single-issued
v2 = v1 | i2; \ #v2,v3 dual-issued
v3 = i1 & i3; / #v2,v3 dual-issued
v4 = v3 & i2; - #single-issued

can be written as:

v1 = i1 ^ i2; \ #v1,v3 dual-issued
v3 = i1 & i3; / #v1,v3 dual-issued
v2 = v1 | i2; \ #v2,v4 dual-issued
v4 = v3 & i2; / #v2,v4 dual-issued

Here is a python implementation I created that recursively reorders the instructions to achieve the lowest "sequence count".

reorder.py
http://pastebin.com/dt8eWy3H

sample t8-1.h
http://pastebin.com/w0DYg8ff

Upvotes: 1

Views: 296

Answers (2)

frankiek3
frankiek3

Reputation: 88

I ended up using the 'Assembly Visualizer' asmVis.jar java program to view the sections of assembly that could be optimized, and I reordered the instructions manually. I increased the speed of the assembly function a great deal using both the odd and even side during almost every instruction cycle (dual issued instructions).

TODO: add github link to source

Upvotes: 1

Chris
Chris

Reputation: 3987

While I can't speak specifically for the Cell, code scheduling is ABSOLUTELY something the compiler should be doing for you.

Compilers will re-order instructions, pad in NOPS as required, and do everything it can to provide a good code schedule for you. Normally, I'd tell you to look at the "mtune" parameters for your compiler (they allow you to tell your compiler exactly what your processor looks like), but since you're coding for the Cell it should already know what to do (but do check the compiler manual just to be sure).

A brief glance at the GCC compiler for the SPU here shows options such as:

-mdual-nops=n
    By default, GCC inserts nops to increase dual issue when 
    it expects it to increase performance. n can be a value from 
    0 to 10. A smaller n inserts fewer nops. 10 is the default, 0 
    is the same as -mno-dual-nops. Disabled with -Os. `

As a programmer, it is your job to provide enough "ILP" in your code to get good scheduling. Try to avoid branches, avoid putting long latency operations on the critical path, etc., and you should be fine. Analyze the objdump of your critical loops to verify the code is being scheduled as you desire. The compiler is very smart, but it may require a little coaxing.

Upvotes: 1

Related Questions