Reputation: 409
Background: I'm trying to create a behavioral file for multiplying three matrices. I'm trying to debug it by first seeing if I can read the input matrix and then output the intermediate matrix.
Behavior File:
USE ieee.std_logic_1164.ALL;
entity DCT_beh is
port (
Clk : in std_logic;
Start : in std_logic;
Din : in INTEGER;
Done : out std_logic;
Dout : out INTEGER
end DCT_beh;
architecture behavioral of DCT_beh is
type RF is array ( 0 to 7, 0 to 7 ) of INTEGER;
variable i, j, k : INTEGER;
variable InBlock : RF;
variable COSBlock : RF;
variable TempBlock : RF;
variable OutBlock : RF;
variable A, B, P, Sum : INTEGER;
COSBlock := (
( 125, 122, 115, 103, 88, 69, 47, 24 ),
( 125, 103, 47, -24, -88, -122, -115, -69 ),
( 125, 69, -47, -122, -88, 24, 115, 103 ),
( 125, 24, -115, -69, 88, 103, -47, -122 ),
( 125, -24, -115, 69, 88, -103, -47, 122 ),
( 125, -69, -47, 122, -88, -24, 115, -103 ),
( 125, -103, 47, 24, -88, 122, -115, 69 ),
( 125, -122, 115, -103, 88, -69, 47, -24 )
wait until Start = '1';
Done <= '0';
--Read Input Data
for i in 0 to 7 loop
for j in 0 to 7 loop
wait until Clk = '1' and clk'event;
InBlock(i,j) := Din;
end loop;
end loop;
--TempBlock = COSBLOCK * InBlock
for i in 0 to 7 loop
for j in 0 to 7 loop
Sum := 0;
for k in 0 to 7 loop
A := COSBlock( i, k );
B := InBlock( k, j );
P := A * B;
Sum := Sum + P;
if( k = 7 ) then
TempBlock( i, j ) := Sum;
end if;
end loop;
end loop;
end loop;
wait until Clk = '1' and Clk'event;
Done <= '1';
--Output Data
for i in 0 to 7 loop
for j in 0 to 7 loop
wait until Clk = '1' and Clk'event;
Done <= '0';
Dout <= tempblock(i,j);
end loop;
end loop;
end process;
end behavioral;
Testbench File:
USE ieee.std_logic_1164.ALL;
-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
--USE ieee.numeric_std.ALL;
ENTITY lab4b_tb IS
END lab4b_tb;
ARCHITECTURE behavior OF lab4b_tb IS
-- Component Declaration for the Unit Under Test (UUT)
Clk : IN std_logic;
Start : IN std_logic;
Done : OUT std_logic;
signal Clk : std_logic := '0';
signal Start : std_logic := '0';
signal Din : INTEGER;
signal Done : std_logic;
signal Dout : INTEGER;
-- Clock period definitions
constant Clk_period : time := 10 ns;
-- Instantiate the Unit Under Test (UUT)
uut: DCT_beh PORT MAP (
Clk => Clk,
Start => Start,
Din => Din,
Done => Done,
Dout => Dout
-- Clock process definitions
Clk_process :process
Clk <= '0';
wait for Clk_period/2;
Clk <= '1';
wait for Clk_period/2;
end process;
-- Stimulus process
stim_proc: process
variable i, j : INTEGER;
variable cnt : INTEGER;
-- hold reset state for 100 ns.
wait for 100 ns;
start <= '1';
wait for clk_period;
start <= '0';
for cnt in 0 to 63 loop
wait until clk = '1' and clk'event;
din <= cnt;
end loop;
--wait for 100 ns;
--start <= '1';
--wait for clk_period;
--start <= '0';
--for i in 0 to 63 loop
-- wait for clk_period;
--if (i < 24) then
--din <= 255;
--elsif (i > 40) then
--din <= 255;
--din <= 0;
--end if;
--end loop;
end process;
From what I'm doing when start = 1 the matrix is read into inputblock. In this case the matrix is just filled with unique incremental values from 0 to 63. Then when done = 1 I output outblock which is the multiplied out matrix. The problem is that in my simulation I receive some values that are supposed to be in the final matrix but aren't in the correct order. For example the line below contains the first row in the multiplied matrix, tempblock:
14464.000 15157.000 15850.000 16543.000 17236.000 17929.000 18622.000 19315.000
As you can see in the picture of my simulation I get some of those values but then the signal becomes some weird large value.
I have some doubts that maybe din(0), din(1), din(2)...din(n) doesn't correspond to inputblock(0,0), inputblock(0,1), inputblock(0,2) etc. But I went over my behavioral file thoroughly and don't see any issues with it. Is there something wrong with how I've designed my testbench?
EDIT: I need help in outputting for this
for i in 0 to 63 loop
wait until clk = '1' and clk'event;
if i = 0 then
Start <= '1','0' after clk_period;
end if;
if (i < 24) then
din <= 255;
elsif (i > 40) then
din <= 255;
din <= 0;
end if;
end loop;
I thought it would be similar to the code in the answer but I ran into the same exact issue. How would this be fixed? Here is a picture of what is currently outputted. The correct values are there but just shifted by one clock period.
FINAL EDIT: Solved it myself. The problem was with the loop boundaries.
Upvotes: 3
Views: 11355
Here's what looks to be a working version of your model and it's testbench
Added (and updated)
If you were to make the the matrix multiple take real time (clocks), you'd see DONE delayed by he number of clocks it took to do the matrix multiply. I arbitrarily picked two clocks just to show the benefit of the added register files.
I'll comment on the interesting parts of the code.
USE ieee.std_logic_1164.ALL;
ENTITY lab4b_tb IS
END lab4b_tb;
ARCHITECTURE behavior OF lab4b_tb IS
signal Clk: std_logic := '0'; -- no reset
signal Start: std_logic := '0'; -- no reset
signal Din: INTEGER := 0; -- no reset
signal Done : std_logic;
signal Dout : INTEGER;
constant Clk_period : time := 10 ns;
uut: entity work.DCT_beh -- DCT_beh
Clk => Clk,
Start => Start,
Din => Din,
Done => Done,
Dout => Dout
Clk <= '0';
wait for Clk_period/2;
Clk <= '1';
wait for Clk_period/2;
end process;
variable i, j : INTEGER;
variable cnt : INTEGER;
wait until clk = '1' and clk'event; -- sync Start to clk
Start <= '1','0' after 11 ns; --issued same time as datum 0
for i in 0 to 63 loop
if (i < 24) then
din <= 255;
elsif (i > 40) then
din <= 255;
din <= 0;
end if;
wait until clk = '1' and clk'event;
end loop;
Start <= '1','0' after 11 ns; -- with first datum
for cnt in 0 to 63 loop
din <= cnt;
wait until clk = '1' and clk'event;
end loop;
din <= 0; -- to show the last input datum clearly
end process;
The two input blocks are you new block value and your original block value which provided an index for the first output block. The second block also shows the same answers as originally, validating the DONE handshaking.
Note Start is concurrent with the first datum of each block.
I also adjusted the input stimulus to start out on a clock boundary to not have the first Start show on falling edges of clocks.
Where there are asynchronously generated pulses I extended them a nanosecond to insure they'd be seen on a clock edge, because they weren't generated on a clock edge.
USE ieee.std_logic_1164.ALL;
entity DCT_beh is
port (
Clk : in std_logic;
Start : in std_logic;
Din : in INTEGER;
Done : out std_logic;
Dout : out INTEGER
end DCT_beh;
architecture behavioral of DCT_beh is
type RF is array ( 0 to 7, 0 to 7 ) of INTEGER;
signal OutBlock: RF;
signal InBlock: RF;
signal internal_Done: std_logic := '0'; -- no reset
signal Input_Ready: std_logic := '0'; -- no reset
signal done_detected: std_logic := '0'; -- no reset
signal input_rdy_detected: std_logic := '0'; -- no reset
signal last_out: std_logic := '0'; -- no reset
wait until Start = '1';
--Read Input Data
for i in 0 to 7 loop
for j in 0 to 7 loop
wait until Clk = '1' and clk'event;
InBlock(i,j) <= Din;
if i=7 and j=7 then
Input_Ready <= '1', '0' after 11 ns;
end if;
end loop;
end loop;
end process;
wait until clk = '1' and clk'event;
input_rdy_detected <= Input_Ready;
--InBlock valid after the following rising edge of clk
end process;
variable InpBlock : RF;
constant COSBlock : RF :=
( 125, 122, 115, 103, 88, 69, 47, 24 ),
( 125, 103, 47, -24, -88, -122, -115, -69 ),
( 125, 69, -47, -122, -88, 24, 115, 103 ),
( 125, 24, -115, -69, 88, 103, -47, -122 ),
( 125, -24, -115, 69, 88, -103, -47, 122 ),
( 125, -69, -47, 122, -88, -24, 115, -103 ),
( 125, -103, 47, 24, -88, 122, -115, 69 ),
( 125, -122, 115, -103, 88, -69, 47, -24 )
variable TempBlock : RF;
variable A, B, P, Sum : INTEGER;
if input_rdy_detected = '0' then
wait until input_rdy_detected = '1';
end if;
InpBlock := InBlock; -- Broadside dump or swap
--TempBlock = COSBLOCK * InBlock
-- arbitrarily make matrix multiple 2 clocks long
wait until clk = '1' and clk'event; -- 1st xfm clock
for i in 0 to 7 loop
for j in 0 to 7 loop
Sum := 0;
for k in 0 to 7 loop
A := COSBlock( i, k );
B := InpBlock( k, j );
P := A * B;
Sum := Sum + P;
if( k = 7 ) then
TempBlock( i, j ) := Sum;
end if;
end loop;
end loop;
end loop;
-- Done issued in clk cycle of last TempBlock( i, j ) := Sum;
internal_Done <= '1', '0' after 11 ns;
wait until clk = '1' and clk'event; -- 2nd xfrm clk
-- OutBlock available after last TempBlock value stored
OutBlock <= TempBlock; -- Broadside dump or swap
end process;
Done <= internal_Done;
wait until clk = '1' and clk'event;
done_detected <= internal_Done;
-- Done can come either before the first output_data transfer
-- or during the last output data transfer
-- this gives us the clock delay to finish the last xfm transfer to
-- TempBlock( i, j)
-- Technically part of the output process but was too cumbersome to write
end process;
-- OutBlock is valid after clock edge when Done is true
for i in 0 to 7 loop
for j in 0 to 7 loop
if i = 0 and j = 0 then
if done_detected = '0' then
wait until done_detected = '1';
end if;
end if;
Dout <= OutBlock(i,j);
wait until clk = '1' and clk'event;
end loop;
end loop;
end process;
end behavioral;
The type definition for RF has been moved to the architecture declarative part to allow inter process communications through signals. The input loop, matrix multiply and output loop are in there own processes. I also added processes for the inter-process handshaking (Input_Ready and input_Done (Done), added signals input_rdy_detect and done_detect.
If a process can take 64 clocks a signal showing the last datum process (Input_Ready and potentially Done) are exerted during the last data transaction of the downstream process. It would be very messy to code otherwise and you'd still need the flip flops.
There's an added RF between the input process and the multiply process to allow concurrent operation when the matrix multiply takes real time (and it takes 2 clocks in this example, I didn't want to stretch out the waveforms too far).
Some of the handshaking delays appear to have been coding style related and cured with the input_rdy_detect and done_detect flip flops.
The first waveform diagram shows the first output data following the two clocks the transform process now takes, shown between A and B markers.
You can see the first output datum following immediately following Done is 78540 and not the 110415 shown in your waveform screen capture. One of us shows the wrong value. This version of DCT_beh strictly enforces transfers of RF values only after the last datum is loaded.
I did get the 110415 value before cleaning up the handshaking between the input process and multiply process. It'd be a lot of work to trace it through the TempBlock our OutBlock.
Now for the good news. The second input block is taken from your original stimulus and the input values make a great index for the output transfers. Those output data values all appear correct.
The signals input_rdy_detect and done_detect happen to show the first transaction in their respective down stream processes. I added a trailing din signal assignment to 0 avoiding confusion at the end of second input block.
Here's a screen capture approximating yours, I can't do selected zoom, instead use successive approximation.
You only need to run the simulation out to 1955 ns to capture the last datum of the 2nd block being out.
This was done using Tristan Gingold's ghdl and Tony Bybell's gtkwave on a Mac running OS X 10.8.4.
Upvotes: 2