Correlated branch prediction

Question

I have this exercise related to correlated predictors that states the following:

A: BEQZ R1, D
…
D: BEQZ R1, F
…
F: NOT R1, R1
G: JUMP A

The prediction works like follows

fetch the current instruction
if it is a branch, determine the current state of the predictor and predict the branch:

a.row is determined by the branch address (in this case either A or D)
b.column is determined by the current global shift register
c.use the value in the cell to determine the prediction from the state machine (current state is saved in the cell)
Execute the branch, and determine the actual decision (Taken: 1, Not Taken: 0):

a.update the cell based on the current state and the actual decision
b.update the global shift register (shift left and add the actual decision bit to right)
goto step 1

I understood the scheme and know that a 2 bit predictor means less errors but I cannot solve this question and I have trouble finding how the solution was found, any help would be appreciated.

Margaret Bloom · Accepted Answer

This a variation of the briefly described Two-level adaptive predictor with global history table in the Agner Fog's microarchitecture paper (page 15).

In this variant, the history register is shared across all branches however the pattern history table is local to a branch¹.

The outcome of the last n (n = 2, in your case) branches is remembered (0 = Not taken, 1 = Taken), ordered from left to right in chronological order, forming a value of n bit that is used, along with the branch address², to index a table of 2-bit saturating counters.

Each counter is incremented if the branch is taken and decremented otherwise (this is the canonical implementation, any 4-state FSA will do).
The meaning of each counter value is:

00b (0) Strongly not taken
01b (1) Weakly not taken
10b (2) Weakly taken
11b (3) Strongly taken

Saturating means than 3+1 (A strongly taken branch is taken again) = 3 and that 0-1 (A strongly not taken branch is again not taken) = 0 while normally arithmetic on registers is modulo 2ⁿ.

In your exercise the assumptions are:

The pattern history table is given as a 2D-table with rows corresponding to the full address of the branch and columns to the value of the global history register.
All counters start in the state 01b (Weakly not taken).
The global history register is 00b at reset.
R1 is 0 at the beginning.

Let's see the first iteration only.

First iteration

The instruction is BEQZ R1, D (a branch, obviously), its address is A.
Since R1 is 0, the branch will be taken (towards D).
Indexing into the table with a global history of 00b and address A gives us the counter value 01b (Weakly not taken) thus the prediction is not taken.

Once the CPU has executed the branch and flushed the mispredicted stage, the table must be updated.
Since the branch was taken, the counter is incremented from 01b to 10b.
Finally, the global history goes from 00b to 01b since the branch is taken (a 1 is shifted in from the right).

Note that the yellow highlighted entries are those read when the corresponding instruction is executed, while the green ones are those updated by the previous prediction.

Thus to see that the counter value has been incremented you have to look at the next row.

Since the branch was taken, the CPU is at D (BEQZ R1, F), this is exactly the same as before, only the global history register has value 01b.

After this instruction is executed the CPU is at F, so R1 becomes 111..11b (the solution just indicates it as 1) and the two above branches are re-executed.

¹This is a simplification, the table is almost always a cache. It's impractical to an entry for each possible memory address where a branch can be found.

²Part of the address is used as the index in the cache, once a set has been selected, the address is again compared with the tag in each way of the set.

Correlated branch prediction

Answers (1)

Related Questions