Reputation: 341
I have a block of code on a Virtex6 that sums a bunch of things at once. I inherited the code and it seems a little more difficult than I would have imagined, but I was told that this was the best way to sum things quickly.
Basically, if I have a series of values that need to be added (say 16 of them), we are currently adding them is multiple levels(summedOutput and additionOverflow are two functions to do the signed addition and detect overflows):
tmpSig_0_0 <= summedOutput(inSig_0, inSig_1);
tmpSig_0_1 <= summedOutput(inSig_2, inSig_3);
tmpSig_0_2 <= summedOutput(inSig_4, inSig_5);
tmpSig_0_3 <= summedOutput(inSig_6, inSig_7);
tmpSig_0_4 <= summedOutput(inSig_8, inSig_9);
tmpSig_0_5 <= summedOutput(inSig_10, inSig_11);
tmpSig_0_6 <= summedOutput(inSig_12, inSig_13);
tmpSig_0_7 <= summedOutput(inSig_14, inSig_15);
overflow_stage0 <= (| {overflow_input,additionOverflow(inSig_0,inSig_1,inSig_0+inSig_1),additionOverflow(inSig_2,inSig_3,inSig_2+inSig_3),additionOverflow(inSig_4,inSig_5,inSig_4+inSig_5),additionOverflow(inSig_6,inSig_7,inSig_6+inSig_7),additionOverflow(inSig_8,inSig_9,inSig_8+inSig_9),additionOverflow(inSig_10,inSig_11,inSig_10+inSig_11),additionOverflow(inSig_12,inSig_13,inSig_12+inSig_13),additionOverflow(inSig_14,inSig_15,inSig_14+inSig_15)});
tmpSig_1_0 <= summedOutput(tmpSig_0_0, tmpSig_0_1);
tmpSig_1_1 <= summedOutput(tmpSig_0_2, tmpSig_0_3);
tmpSig_1_2 <= summedOutput(tmpSig_0_4, tmpSig_0_5);
tmpSig_1_3 <= summedOutput(tmpSig_0_6, tmpSig_0_7);
overflow_stage1 <= (| {overflow_stage0, additionOverflow(tmpSig_0_0,tmpSig_0_1,tmpSig_0_0+tmpSig_0_1), additionOverflow(tmpSig_0_2,tmpSig_0_3,tmpSig_0_2+tmpSig_0_3), additionOverflow(tmpSig_0_4,tmpSig_0_5,tmpSig_0_4+tmpSig_0_5), additionOverflow(tmpSig_0_6,tmpSig_0_7,tmpSig_0_6+tmpSig_0_7)});
tmpSig_2_0 <= summedOutput(tmpSig_1_0, tmpSig_1_1);
tmpSig_2_1 <= summedOutput(tmpSig_1_2, tmpSig_1_3);
overflow_stage2 <= (| {overflow_stage1, additionOverflow(tmpSig_1_0,tmpSig_1_1,tmpSig_1_0+tmpSig_1_1), additionOverflow(tmpSig_1_2,tmpSig_1_3,tmpSig_1_2+tmpSig_1_3)});
outSig <= summedOutput(tmpSig_2_0, tmpSig_2_1);
overflow <= (| {overflow_stage2, additionOverflow(tmpSig_2_0, tmpSig_2_1, tmpSig_2_0+tmpSig_2_1)});
I was told that this would cause 4 levels of additions (which makes sense), and is much better than the 16 levels if I would have just done:
outSig <= inSig_0 + inSig_1 + inSig_2 + inSig_3 .... inSig_14 + inSig_15;
My problem is that if I want to expand things, it is a very manual process and not very adaptive. Is there a smarter way of doing this than I am doing? What would be best would be a series of for-loops that added things based on a parameter size, but then I would essentially end up with the second example above andthat could be pretty deep.
Upvotes: 1
Views: 1652
Reputation: 830
How about the generate loop. I haven't tested this code fragment but I note it here for illustration.
parameter num_of_additions = 16
genvar i,j,k;
generate
begin
for (k=0;k<(num_of_additions>>3);k=k+1)
begin : adder_l3
tmp_sig_l3[k] = summedOutput(tmp_sig_l2[k], tmp_sig_l2[k+1));
oflw_l3[k] = {additionOverflow (tmp_sig_l2[k*2],tmp_sig_l2[(k*2)+1],
tmp_sig_l2[(k*2)]+tmp_sig_l2[(k*2)+1])};
end : adder_l3
for (j=0;j<(num_of_additions>>2);j=j+1)
begin : adder_l2
tmp_sig_l2[j] = summedOutput(tmp_sig_l1[j], tmp_sig_l1[j+1));
oflw_l2[j] = {additionOverflow (tmp_sig_l1[j*2],tmp_sig_l1[(j*2)+1],
tmp_sig_l1[(j*2)]+tmp_sig_l1[(j*2)+1])};
end : adder_l2
for (i=0;i<(num_of_additions >> 1);i=i+1)
begin : adder_l1
tmp_sig_l1[i] = summedOutput(inSig[i], inSig[i+1);
oflw_l1[i] = {additionOverflow (inSig[i*2],inSig[(i*2)+1],
inSig[(i*2)]+inSig[(i*2)+1])}
end : adder_l1
end
endgenerate
//this is not parameterized
assign outSig <= summedOutput(tmp_sig_l3[0], tmp_sig_l3[1]);
overflow <= (| {oflw_l3,oflw_l2,oflw_l1, additionOverflow(tmp_sig_l3[0],tmp_sig_l3[1],
tmp_sig_l3[0]+tmp_sig_l3[1])};
Upvotes: 0
Reputation: 35943
The main difference between the existing code (long code) and your one line solution, is that the existing code is pipelined into 4-clock cycles to compute the result, and your one line solution purports to add all the numbers in a single clock cycle.
Depending on how wide these inSig
s are, it may not meet timing to try to add them all up at once. You could certainly do a trial and replace that code with your suggestion, but then you should try synthesis and P&R and see what your timing reports look like. If it meets your desired timing frequency, then you can go ahead and replace with your solution safely.
Alternatively you could try making a shallower pipeline which would be cleaner (can you add 4 signals in a clock cycle instead of 2?)
tmpSig_0 <= insig_0 + insig_1 + insig_2 + insig_3;
tmpSig_1 <= insig_4 + insig_5 + insig_6 + insig_7;
tmpSig_2 <= insig_8 + insig_9 + insig_10 + insig_11;
tmpSig_3 <= insig_12 + insig_13 + insig_14 + insig_15;
outSig <= tmpSig_0 + tmpSig_1 + tmpSig_2 + tmpSig_3;
This would be a 5-line, 2 stage pipeline instead of a much more complex code, but again you need to check if it meets timing requirements.
Upvotes: 2