Reputation: 63250
I have an implementation of a Inverse Discrete Cosine Transform and I'm trying to figure out how they got to this code. So far, I've figured out that this is probably an optimized implementation of the Cooley-Tukey radix-2 Decimation-in-time for a DCT instead of a DFT (Discrete Fourier Transform).
However, I'm still at a loss about exactly what happens at each stage. I figured that the Wx constants are probably the twiddle factors.
Can anybody provide a reference to an explanation, or provide some explanation to this code?
//Twiddle factors
#define W1 2841 /* 2048*sqrt(2)*cos(1*pi/16) */
#define W2 2676 /* 2048*sqrt(2)*cos(2*pi/16) */
#define W3 2408 /* 2048*sqrt(2)*cos(3*pi/16) */
#define W5 1609 /* 2048*sqrt(2)*cos(5*pi/16) */
#define W6 1108 /* 2048*sqrt(2)*cos(6*pi/16) */
#define W7 565 /* 2048*sqrt(2)*cos(7*pi/16) */
//Discrete Cosine Transform on a row of 8 DCT coefficients.
NJ_INLINE void njRowIDCT(int* blk) {
int x0, x1, x2, x3, x4, x5, x6, x7, x8;
int t;
if (!((x1 = blk[4] << 11)
| (x2 = blk[6])
| (x3 = blk[2])
| (x4 = blk[1])
| (x5 = blk[7])
| (x6 = blk[5])
| (x7 = blk[3])))
{
blk[0] = blk[1] = blk[2] = blk[3] = blk[4] = blk[5] = blk[6] = blk[7] = blk[0] << 3;
return;
}
x0 = (blk[0] << 11) + 128; //For rounding at fourth stage
//First stage
/*What exactly are we doing here? Do the x values have a meaning?*/
x8 = W7 * (x4 + x5);
x4 = x8 + (W1 - W7) * x4;
x5 = x8 - (W1 + W7) * x5;
x8 = W3 * (x6 + x7);
x6 = x8 - (W3 - W5) * x6;
x7 = x8 - (W3 + W5) * x7;
//Second stage
x8 = x0 + x1;
x0 -= x1;
x1 = W6 * (x3 + x2);
x2 = x1 - (W2 + W6) * x2;
x3 = x1 + (W2 - W6) * x3;
x1 = x4 + x6;
x4 -= x6;
x6 = x5 + x7;
x5 -= x7;
//Third stage
x7 = x8 + x3;
x8 -= x3;
x3 = x0 + x2;
x0 -= x2;
x2 = (181 * (x4 + x5) + 128) >> 8;
x4 = (181 * (x4 - x5) + 128) >> 8;
//Fourth stage
blk[0] = (x7 + x1) >> 8; //bit shift is to emulate 8 bit fixed point precision
blk[1] = (x3 + x2) >> 8;
blk[2] = (x0 + x4) >> 8;
blk[3] = (x8 + x6) >> 8;
blk[4] = (x8 - x6) >> 8;
blk[5] = (x0 - x4) >> 8;
blk[6] = (x3 - x2) >> 8;
blk[7] = (x7 - x1) >> 8;
}
Upvotes: 4
Views: 2569
Reputation: 21561
I'm not an expert at DCTs but I have written a few FFT implementations in my time so I'm going to take a stab at answering this. Please take the following with a pinch of salt.
void njRowIDCT(int* blk)
You correctly say that the algorithm appears to be an 8-length Radix-2 DCT that uses fixed point arithmetic with 24:8 precision. I'm guessing the precision because the last stage right shifts by 8 to get the desired (that and the tell tale comment ;)
Because its 8-length, its power is 3 (2^3 = 8) meaning there are 3 stages in the DCT. So far this is all very similar to FFTs. The "fourth stage" seems to be just a scaling to recover the original precision after fixed point arithmetic.
As far as I can see the input stage is the bit-reversal from input array blk to local variables x0-x7. x8 seems to be a temporary variable. Sorry I can't be more descriptive than that.
Bit reversal stage
Update
Take a look at DSP For Scientists and Engineers. It gives a clear and precise explanation of signal processing topics. This chapter is on the DCT (please skip to p497).
The Wn (twiddle factors) correspond to the Basis Functions in this chapter, although note this is describing an 8x8 (2D) DCT.
With regard to 3 stages that I mentioned, compare to the description of an 8 point FFT:
The FFT is performing butterflies on the bit-reversed input array (which are essentially complex multiply-adds), multiplying one path by the Wn or twiddle factor along the way. The FFT is performed in stages. I still haven't worked out what your DCT code is doing but decomposing it into a diagram like this may help.
That or someone who knows what they're talking about step up ;-)
I'll revisit this page and edit as I decipher more code
Upvotes: 4
Reputation: 70733
Both the DFT and the DCT are just linear transforms, which can be represented as a single complex matrix multiply (sometime pruned for strictly real input). So you could just combine the above equations to get the formula for each final term, which should end up equivalent to one row of the linear transform matrix (ignoring rounding issues). Then see how the above code sequence is manually doing a common subexpression optimization or refactoring between and/or within row calculations.
Upvotes: 1