Zuchao Wang
Zuchao Wang

Reputation: 19

In pipelined processor design, why register file read/write are performed in half cycle?

I'm reading about pipelined MIPS processor design in the book "Digital Design and Computer Architecture (Second Edition)" by David Money Harris and Sarah L. Harris.

In section 7.5.3 "Hazards" it says (page 415):

The register file can be read and written in the same cycle. The write takes place during the first half of the cycle and the read takes place during the second half of the cycle, so a register can be written and read back in the same cycle without introducing a hazard.

My question is: why can't the register file just get read and written simultaneously?

Actually, my question is quite similar to this stackexchange one, but the answers there does not make me fully clear. Also I'm not allowed to comment there due to lack of reputation, so I start this question.

I think for a SRAM register file with 2 read ports and 1 write port, as shown in wikipedia, it seems perfectly legal to read and write the same address simultaneously. Although the write operation will cause the bits stored in cross-coupled inverters unstable for a while, as along as the clock cycle of the pipelined processor is long enough, the bits will get stabilized. Therefore the read operation, which is fully combinational, can get the correct data. Then why not read and write simultaneously?

My second question is, if we must use such a register file as suggested by the book, which read in the first half cycle, and write in the second half cycle, how to implement this register file as circuits?

My naive solution is to redefine write_enable and read_enable signal of the register file. Let write_enable = write_enable & clock and read_enable = read_enable & ~clock. But the book seems to suggest to write on the failing edge, see HDL example 7.6 register file code comment (page 435):

for pipelined processor, write third port on falling edge of clk

I would assume a clock cycle starts with 1 in the first half, then drops to 0 in the second half. Therefore I feel writing on the falling edge actually results in writing in the second half of the clock cycle, not the first half. What's more, it does nothing to ensure reading in the second half of the cycle. How can it work?

Thanks in advance.

Upvotes: 1

Views: 1732

Answers (1)

Jer
Jer

Reputation: 1

1. Ultimately it is because then some instructions can execute with one less stall (no-op).

Think about the following example: ADD R5, R2, R1 SW R5, 32(R1) SUB R3, R5, R0 Let's try out the status quo, write first, read second:

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <style>
    table {
      border-collapse: collapse;
      width: 100%;
      margin-top: 20px;
    }
    th, td {
      border: 1px solid #000;
      padding: 8px;
      text-align: center;
      min-width: 60px;
      position: relative;
    }
    th {
      background-color: #ddd;
    }
    .bubble {
      background-color: #fdd;
      font-style: italic;
    }
    .circle {
      display: inline-block;
      padding: 5px;
      border: 2px solid red;
      border-radius: 50%;
      font-weight: bold;
    }
  </style>
</head>
<body>
  <table>
    <tr>
      <th>Instruction</th>
      <th>Cycle 1</th>
      <th>Cycle 2</th>
      <th>Cycle 3</th>
      <th>Cycle 4</th>
      <th>Cycle 5</th>
      <th>Cycle 6</th>
      <th>Cycle 7</th>
      <th>Cycle 8</th>
      <th>Cycle 9</th>
    </tr>
    <tr>
      <td>ADD R5, R2, R1</td>
      <td>IF</td>
      <td>ID</td>
      <td>EX</td>
      <td>MEM</td>
      <!-- Wrap WB in a span to circle it -->
      <td><span class="circle">WB</span></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <!-- Two stall cycles (bubbles) inserted after ADD -->
    <tr class="bubble">
      <td>Stall</td>
      <td></td>
      <td>Stall</td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr class="bubble">
      <td>Stall</td>
      <td></td>
      <td></td>
      <td>Stall</td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td>SW R5, 32(R1)</td>
      <td></td>
      <td></td>
      <td></td>
      <td>IF</td>
      <!-- Wrap ID in a span to circle it -->
      <td><span class="circle">ID</span></td>
      <td>EX</td>
      <td>MEM</td>
      <td>WB</td>
      <td></td>
    </tr>
    <tr>
      <td>SUB R3, R5, R0</td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td>IF</td>
      <td>ID</td>
      <td>EX</td>
      <td>MEM</td>
      <td>WB</td>
    </tr>
  </table>
</body>
</html>

Notice how ADD's WB (in red circle) is executed in the same clock cycle with SW's ID (also in red circle)? This is possible since the register file writes ADD instruction's result of R2+R1 into R5 first, then SW instruction fetches data in register R5, ensuring no data hazard.

Then, let's do read first, write second. Since we need to read after r5 is updated to avoid data hazard, we need to make sure ADD instruction finishes WB (writeback) and then SW can fetch the register r5's data:

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <style>
    table {
      border-collapse: collapse;
      width: 100%;
      margin-top: 20px;
    }
    th, td {
      border: 1px solid #000;
      padding: 8px;
      text-align: center;
      min-width: 60px;
      position: relative;
    }
    th {
      background-color: #ddd;
    }
    .bubble {
      background-color: #fdd;
      font-style: italic;
    }
    .circle {
      display: inline-block;
      padding: 5px;
      border: 2px solid red;
      border-radius: 50%;
      font-weight: bold;
    }
  </style>
</head>
<body>
  <table>
    <tr>
      <th>Instruction</th>
      <th>Cycle 1</th>
      <th>Cycle 2</th>
      <th>Cycle 3</th>
      <th>Cycle 4</th>
      <th>Cycle 5</th>
      <th>Cycle 6</th>
      <th>Cycle 7</th>
      <th>Cycle 8</th>
      <th>Cycle 9</th>
    </tr>
    <tr>
      <td>ADD R5, R2, R1</td>
      <td>IF</td>
      <td>ID</td>
      <td>EX</td>
      <td>MEM</td>
      <!-- Wrap WB in a span to circle it -->
      <td><span class="circle">WB</span></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <!-- Two stall cycles (bubbles) inserted after ADD -->
    <tr class="bubble">
      <td>Stall</td>
      <td></td>
      <td>Stall</td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr class="bubble">
      <td>Stall</td>
      <td></td>
      <td></td>
      <td>Stall</td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
        <tr class="bubble">
      <td>Stall</td>
      <td></td>
      <td></td>
      <td></td>
      <td>Stall</td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td>SW R5, 32(R1)</td>
      <td></td>
      <td></td>
      <td></td>
       <td></td>
      <td>IF</td>
      <!-- Wrap ID in a span to circle it -->
      <td><span class="circle">ID</span></td>
      <td>EX</td>
      <td>MEM</td>
      <td>WB</td>
    </tr>
    <tr>
      <td>SUB R3, R5, R0</td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td>IF</td>
      <td>ID</td>
      <td>EX</td>
      <td>MEM</td>
    </tr>
  </table>
</body>
</html>

Notice how this time, we have to execute SW's ID one clock cycle later, since the register(r5) will not be written first, pushing an extra stall to read the register in the next clock cycle

2. Let's look at the implementation of a register file.

Implementation of a 2^2 x 5 Register File

This implementation is from a lab of UC Riverside.

Let's look at the circuit. We can see that there are generally 2 paths: write and read. at clock edge, both write and read lines will pass in their current values (both data and addresses, if enabled).

Let's look at read first. The data from the desired registers will pass through driver and passed onto the 32 bit bus. Then let's look at write. The data will load the corresponding registers. But if you follow that bus, you will see the data will follow the bus unto the same path as the read data. Therefore, even if the read was passed through first, the write data will over write that in the databus of the register file.

However, you can see that this design does not handle clock, hence I could not really fully answer your question. Please let me know if you find a circuit of a register file that has clock. But I can imagine a and logic between the clock and write_enable or read_enable, like you mentioned in your question. I will do more research as well on how register file is implemented in real life, especially synchronously.

I have found this website that animates the data path to better understand what the pipeline does at each instruction.

Also, your textbook is very detailed, especially on this topic in chapter seven. Pity that it doesn't show a schematic of a register file anywhere in the book.

Upvotes: 0

Related Questions