Best way to sum many things on an FPGA

Question

I have a block of code on a Virtex6 that sums a bunch of things at once. I inherited the code and it seems a little more difficult than I would have imagined, but I was told that this was the best way to sum things quickly.

Basically, if I have a series of values that need to be added (say 16 of them), we are currently adding them is multiple levels(summedOutput and additionOverflow are two functions to do the signed addition and detect overflows):

    tmpSig_0_0 <= summedOutput(inSig_0, inSig_1);
    tmpSig_0_1 <= summedOutput(inSig_2, inSig_3);
    tmpSig_0_2 <= summedOutput(inSig_4, inSig_5);
    tmpSig_0_3 <= summedOutput(inSig_6, inSig_7);
    tmpSig_0_4 <= summedOutput(inSig_8, inSig_9);
    tmpSig_0_5 <= summedOutput(inSig_10, inSig_11);
    tmpSig_0_6 <= summedOutput(inSig_12, inSig_13);
    tmpSig_0_7 <= summedOutput(inSig_14, inSig_15);
    overflow_stage0 <= (| {overflow_input,additionOverflow(inSig_0,inSig_1,inSig_0+inSig_1),additionOverflow(inSig_2,inSig_3,inSig_2+inSig_3),additionOverflow(inSig_4,inSig_5,inSig_4+inSig_5),additionOverflow(inSig_6,inSig_7,inSig_6+inSig_7),additionOverflow(inSig_8,inSig_9,inSig_8+inSig_9),additionOverflow(inSig_10,inSig_11,inSig_10+inSig_11),additionOverflow(inSig_12,inSig_13,inSig_12+inSig_13),additionOverflow(inSig_14,inSig_15,inSig_14+inSig_15)});    

    tmpSig_1_0 <= summedOutput(tmpSig_0_0, tmpSig_0_1);
    tmpSig_1_1 <= summedOutput(tmpSig_0_2, tmpSig_0_3);
    tmpSig_1_2 <= summedOutput(tmpSig_0_4, tmpSig_0_5);
    tmpSig_1_3 <= summedOutput(tmpSig_0_6, tmpSig_0_7);
    overflow_stage1 <= (| {overflow_stage0, additionOverflow(tmpSig_0_0,tmpSig_0_1,tmpSig_0_0+tmpSig_0_1), additionOverflow(tmpSig_0_2,tmpSig_0_3,tmpSig_0_2+tmpSig_0_3), additionOverflow(tmpSig_0_4,tmpSig_0_5,tmpSig_0_4+tmpSig_0_5), additionOverflow(tmpSig_0_6,tmpSig_0_7,tmpSig_0_6+tmpSig_0_7)});

    tmpSig_2_0 <= summedOutput(tmpSig_1_0, tmpSig_1_1);
    tmpSig_2_1 <= summedOutput(tmpSig_1_2, tmpSig_1_3);
    overflow_stage2 <= (| {overflow_stage1, additionOverflow(tmpSig_1_0,tmpSig_1_1,tmpSig_1_0+tmpSig_1_1), additionOverflow(tmpSig_1_2,tmpSig_1_3,tmpSig_1_2+tmpSig_1_3)});

    outSig <= summedOutput(tmpSig_2_0, tmpSig_2_1);
    overflow <= (| {overflow_stage2, additionOverflow(tmpSig_2_0, tmpSig_2_1, tmpSig_2_0+tmpSig_2_1)});

I was told that this would cause 4 levels of additions (which makes sense), and is much better than the 16 levels if I would have just done:

outSig <= inSig_0 + inSig_1 + inSig_2 + inSig_3 .... inSig_14 + inSig_15;

My problem is that if I want to expand things, it is a very manual process and not very adaptive. Is there a smarter way of doing this than I am doing? What would be best would be a series of for-loops that added things based on a parameter size, but then I would essentially end up with the second example above andthat could be pretty deep.

Tim · Accepted Answer

The main difference between the existing code (long code) and your one line solution, is that the existing code is pipelined into 4-clock cycles to compute the result, and your one line solution purports to add all the numbers in a single clock cycle.

Depending on how wide these inSigs are, it may not meet timing to try to add them all up at once. You could certainly do a trial and replace that code with your suggestion, but then you should try synthesis and P&R and see what your timing reports look like. If it meets your desired timing frequency, then you can go ahead and replace with your solution safely.

Alternatively you could try making a shallower pipeline which would be cleaner (can you add 4 signals in a clock cycle instead of 2?)

tmpSig_0 <= insig_0 + insig_1 + insig_2 + insig_3;
tmpSig_1 <= insig_4 + insig_5 + insig_6 + insig_7;
tmpSig_2 <= insig_8 + insig_9 + insig_10 + insig_11;
tmpSig_3 <= insig_12 + insig_13 + insig_14 + insig_15;

outSig   <= tmpSig_0 + tmpSig_1 + tmpSig_2 + tmpSig_3;

This would be a 5-line, 2 stage pipeline instead of a much more complex code, but again you need to check if it meets timing requirements.

Best way to sum many things on an FPGA

Answers (2)

Related Questions