toozie21
toozie21

Reputation: 341

Is there a way to sum multi-dimensional arrays in verilog?

This is something that I think should be doable, but I am failing at how to do it in the HDL world. Currently I have a design I inherited that is summing a multidimensional array, but we have to pre-write the addition block because one of the dimensions is a synthesize-time option, and we cater the addition to that.

If I have something like reg tap_out[src][dst][tap], where src and dst is set to 4 and tap can be between 0 and 15 (16 possibilities), I want to be able to assign output[dst] be the sum of all the tap_out for that particular dst.

Right now our summation block takes all the combinations of tap_out for each src and tap and sums them in pairs for each dst:
tap_out[0][dst][0]
tap_out[1][dst][0]
tap_out[2][dst][0]
tap_out[3][dst][0]
tap_out[0][dst][1]
....
tap_out[3][dst][15]

Is there a way to do this better in Verilog? In C I would use some for-loops, but that doesn't seem possible here.

Upvotes: 0

Views: 3648

Answers (2)

Greg
Greg

Reputation: 19112

for-loops work perfectly fine in this situation

integer src_idx, tap_idx;
always @* begin
  sum = 0;
  for (scr_idx=0; src_idx<4; src_idx=scr_idx+1) begin
    for (tap_idx=0; tap_idx<16; tap_idx=tap_idx+1) begin
      sum = sum + tap_out[src_idx][dst][tap_idx];
    end
  end
end

It does unroll into a large combinational logic during synthesis and the results should be the same adding up the bits line by line.

Propagation delay from a large summing logic could have a timing issue. A good synthesizer should find the optimum timing/area when told the clocking constraint. If logic is too complex for the synthesizer, then add your own partial sum logic that can run in parallel

reg [`WIDHT-1:0] /*keep*/ partial_sum [3:0]; // tell synthesis to preserve these nets
integer src_idx, tap_idx;
always @* begin
  sum = 0;
  for (scr_idx=0; src_idx<4; src_idx=scr_idx+1) begin
    partial_sum[scr_idx] = 0;
    // partial sums are independent of each other so the can run in parallel
    for (tap_idx=0; tap_idx<16; tap_idx=tap_idx+1) begin
      partial_sum[scr_idx] = partial_sum[scr_idx] + tap_out[src_idx][dst][tap_idx];
    end
    sum = sum + partial_sum[scr_idx]; // sum the partial sums
  end
end

If timing is still an issue, then you have must treat the logic as multi-cycle and sample the value some clock cycles after the input changed.

Upvotes: 3

Unn
Unn

Reputation: 5098

In RTL (the level of abstraction you are likely modelling with your HDL), you have to balance parallelism with time. By doing things in parallel, you save time (typically) but the logic takes up a lot of space. Conversely, you can make the adds completely serial (do one add at one time) and store the results in a register (it sounds like you want to accumulate the total sum, so I will explain that).

It sounds like the fully parallel is not practical for your uses (if it is and you want to rewrite it, look up generate statements). So, you'll need to create a small FSM and accumulate the sums into a register. Here's a basic example, which sums an array of 16-bit numbers (assume they are set somewhere else):

reg [15:0] arr[0:9]; // numbers
reg [31:0] result; // accumulated sum
reg load_result; // load signal for register containing result
reg clk, rst_L; // These are the clock and reset signals (reset asserted low)

/* This is a register for storing the result */
always @(posedge clk, negedge rst_L) begin
  if (~rst_L) begin
    result <= 32'd0;
  end
  else begin
    if (load_result) begin
      result <= next_result;
    end
  end
end

/* A counter for knowing which element of the array we are adding
reg [3:0] counter, next_counter;
reg load_counter;

always @(posedge clk, negedge rst_L) begin
  if (~rst_L) begin
    counter <= 4'd0;
  end
  else begin
    if (load_counter) begin
      counter <= counter + 4'd1;
    end
  end
end

/* Perform the addition */
assign next_result = result + arr[counter];

/* Define the state machine states and state variable */
localparam IDLE = 2'd0;
localparam ADDING = 2'd1;
localparam DONE = 2'd2;
reg [1:0] state, next_state;

/* A register for holding the current state */
always @(posedge clk, negedge rst_L) begin
  if (~rst_L) begin
    state <= IDLE;
  end
  else begin
    state <= next_state;
  end
end

/* The next state and output logic, this will control the addition */
always @(*) begin
  /* Defaults */
  next_state = IDLE;
  load_result = 1'b0;
  load_counter = 1'b0;

  case (state)
    IDLE: begin
      next_state = ADDING; // Start adding now (right away)
    end
    ADDING: begin
      load_result = 1'b1; // Load in the result
      if (counter == 3'd9) begin // If we're on the last element, stop incrementing counter, we are done
        load_counter = 1'b0;
        next_state = DONE;
      end
      else begin // Otherwise, keep adding
        load_counter = 1'b1;
        next_state = ADDING;
      end
    end
    DONE: begin // finished adding, result is in result!
      next_state = DONE;
    end
  endcase
end

There are lots of resources on the web explaining FSMs if you are having trouble with the concept, but they can be used to implement your basic C-style for loop.

Upvotes: 1

Related Questions