Reputation: 1370
I have got some procedure to draw scanline with depth - It takes xs as a x start xk as a x end, y and zs as z-depth at xs and zk as z-depth in zk (z value change linear form xs to xk) Float deptht buffer in ram is used to depth-test
Here it is
inline void drawScanlineWithDepth(int y, int xs, int xk, float zs, float zk, unsigned color)
{
if(y<0) return; //clip
if(y>=CLIENT_Y) return; //
if(xs>xk) // swap to assure xs is on left xk at right
{
int temp = xs; xs=xk; xk=temp;
float tempp = zs; zs=zk; zk=tempp;
}
if(xs<0) //cut left end to 0
{
if(xk<0) return;
float dod_zs = (-xs)*float(zk-zs)/float(xk-xs);
zs += dod_zs;
xs=0;
}
if(xk>=CLIENT_X) //cut right end to CLIENT_X-1
{
if(xs>=CLIENT_X) return;
float sub_zk = (xk-(CLIENT_X-1))*float(zk-zs)/float(xk-xs);
zk -= sub_zk;
xk = CLIENT_X-1;
}
int len = xk-xs;
int yc = CLIENT_Y-y-1; //reverse y coordinate becouse blitter reverses it
int adr_ = yc*CLIENT_X + xs;
int adr_depth = ( yc<<12 ) + xs; // depth[] is a static table with 4096 width
float* db = ((float*) depth) + adr_depth;
unsigned* adr = ((unsigned*)pBits) + adr_;
if(len<=3) //unwind short scanlines
{
if(len==0)
{
if(zs< *db) *db = zs, *adr = color;
return;
}
else if(len==1)
{
if(zs< *db) *db = zs, *adr = color; db++; adr++;
if(zk< *db) *db = zk, *adr = color;
return;
}
else if(len==2)
{
float zs_1 = zs + len*0.5;
if(zs <*db) *db = zs, *adr = color; db++; adr++;
if(zs_1<*db) *db = zs_1, *adr = color; db++; adr++;
if(zk <*db) *db = zk, *adr = color;
return;
}
else if(len==3)
{
float zs_1 = zs + (len)*(1./3.);
float zs_2 = zs + (len)*(2./3.);
if(zs < *db) *db = zs , *adr = color; db++; adr++;
if(zs_1< *db) *db = zs_1 , *adr = color; db++; adr++;
if(zs_2< *db) *db = zs_2 , *adr = color; db++; adr++;
if(zk < *db) *db = zk , *adr = color;
return;
}
}
if(len==0) ERROR_("len == 0");
if(len<0) ERROR_("len < 0");
float dz = float(zk-zs)/float(len);
float z = zs;
for(int i=0; i<=len; i++)
{
if(z < *db) //depthtest
{
*db = z; //set pixel
*adr = color;
}
adr++;
db++;
z+=dz;
}
}
'Unwinding' a loop for short scanlines as above for 1,2,3,4 length makes it faster but with more unvinding I do not see much improvement Can it be optymized more?
len
Upvotes: 0
Views: 53
Reputation: 1479
One really important thing for optimization is to consider what the most important workload is. Is it drawing lots of short(ish) spans or is it drawing mostly really long spans? The main target of optimization is different in each case.
Also, which processor you are using is important; whether branch prediction misses (i.e., branching as a whole) slows you down or not.
Short spans:
One point is that it would be a great idea to move some of the tests (y clipping etc) to be outside this function, to ensure that it doesn't get called with those Y values at all.
Same for swapping the sides; you could also unroll those cases.
Long spans:
The answer depends on your compiler and CPU; using some multimedia extensions a la SSE would be a great idea. Also, you could unroll the loop inside the for() to do two pixels inside a single iteration to win a little bit (unless you can get your compiler to do it for you; have you looked at the assembly output and tweaked the optimizer settings?)
Upvotes: 1