Reputation: 2548
We learned all the main details about control lines and the general functionality of the MIPS chip in single cycle and also with pipelining.
But, in multicycle the control lines aren't identical in addition to other changes.
Specifically what does the TargetWrite (ALUout) and IorD control lines actually modify? Based on my analysis, TW seems to modify where the PC points to depending on the bits it receives (for Jump, Branch, or standard moving to the next line)... Am I missing something?
Also what exactly does the IorD line do? I looked at both course textbooks: See Mips Run and the Computer Architecture: A Quantitative Approach by Patterson and Hennessy which don't seem to mention these lines...
Upvotes: 0
Views: 606
Reputation: 26686
First, let's note that this block diagram does not have separate instruction memory and data memory. That means that it either has a unified cache or goes directly to memory. Most other block diagrams for MIPS will have separate dedicated Instruction Memory (cache) and Data memory (cache). The advantage of this is that the processor can read instructions and read/write data in parallel. In the a simple version of a multicycle processor, there is likely no need to read instructions and data in parallel, so a unified cache simplifies the hardware.
So, what IorD
is doing is selecting the source for the address provided to the Memory — as to whether it is doing a fetch cycle for an instruction, or a read/write from/to data.
When IorD=0
then the PC
provides the address from which to read (i.e. instruction fetch), and, when IorD=1
then the ALU provides the address to read/write data from. For data operations, the ALU is computing a base + displacement addressing mode: Reg[rs] + SignExt32(imm16)
as the effective address to use for the data read or write operation.
Further, let's note that this block diagram does not contain a separate adder for incrementing the PC by 4, whereas most other block diagrams do. Lookup any of the first few MIPS single cycle datapath images, and you'll see the dedicated adder for that PC increment. Using a dedicated adder allows the PC to be incremented in parallel with operations done by the ALU, whereas omitting that dedicated adder means that the main ALU must perform the increment of the PC. However, this probably saves transistors in a simple version of a multicycle implementation where the ALU is not in use every cycle, and so can be used otherwise.
Since Target
has a control TargetWrite
, we might presume this is an internal register that might be useful in buffering the intended branch target address, for example, if the branch target is computed in one cycle, and finally used in another.
(I thought this could be about buffering for branch delay slot implementation (since those branches are delayed one instruction), but were that the case, the J-Type instructions would have also gone through Target
, and they don't.)
So, it looks to me like the machinery there for this multicycle processor is to handle the branch instructions, say beq
, which has to:
PC
address from PC + 4
(PC+4) + SignExt32(imm32)
Reg[rs] == Reg[rt]
?)But what order would they be computed? It is clear from control signals in state 0 is that: PC+4
is computed first, and written back to the PC
, for all instructions (i.e. for branches, whether the branch is taken or not).
It seems to me that in a next cycle, (PC+4) + SignExt32(imm16)
is computed (by reusing the prior PC+4
which is now in the PC
register — this result is stored in Target
to buffer that value since it doesn't yet know if the branch is taken or not. In a next cycle, contents of rs
and rt
are compared for equality and if equal, the branch should be taken, so PCSource=1, PCWrite=1
selects the Target
from the buffer to update the PC
, and if not taken, since the PC
already has been updated to PC+4
, that PC+4
stands (PCWrite=0, PCSource=don't care
) for the start of the next instruction. In either case the next instruction runs with what address the PC
holds.
Alternately, since the processor is multicycle, the order of computation could be: compute PC+4
and store into the PC
. Compute the branch condition, and decide what kind of cycle to run next, namely, for the not-taken condition, go right to the next instruction fetch cycle (with PC+4
in the PC
), or, for taken branch condition, compute (PC+4) + SignExt32(imm16)
and put that into the PC
, and then go on to the next instruction fetch cycle.
This alternative approach would require dynamic alteration of the cycles/state for branches, so would complicate the multicycle state machine somewhat and would also not require buffering of a branch Target
— so I think it is more likely the former rather than this alternative.
Upvotes: 1