Reputation: 516
So I have recently been studying about Pipeline processor architecture, mainly in the context of Y86-64. There, I have just read about Branch Prediction and how in case of a mispredicted branch, the Fetch, Decode and Execute Pipeline registers have to be flushed and the new correct branch instruction has to be processed.
I was wondering if it is possible to actually design a hardware, with maybe 2 sets of pipeline registers such that when it fetches a conditional instruction, it starts processing both outcomes in parallel, updating one set of registers as if the branching will not take place and the other set as if the branching will take place.
Noticeably, the problem arises if one or both of the branches in turn lead to instruction that themselves also a branching instruction, then 2 sets are not sufficient. But since by the time the first branch condition reaches the execute stage, we will know which branch to actually take, and so we can eliminate the wrong branch and all of its sub branches as well. And since it will take 3 clock cycles for the first branch instruction to get from the Fetch to the Execute stage, I would think that we would, in the worst case, only need 2^3, which is 8 sets of pipeline registers.
Besides this being a little difficult to implement hardware wise, is there anything wrong with my assumption that this approach would work? Or is this already being done in more sophisticated architectures like X86-64 maybe?
Thanks.
Upvotes: 2
Views: 296
Reputation: 57774
As far as RISC vs. CISC architectures, the latter tried techniques roughly like what you suggest around the late 1980s/early 1990s as I recall. Checking Wikipedia for branch prediction analysis doesn't have an article but does redirect to this in the RSA (encryption) article which describes a technique exploiting the branch predictor which helps find a private encryption key. It also mentions simultaneous multithreading as a way to speed up branch prediction.
To more directly address your question, see the details section in simultaneous multithreading. Generally, it seems to be an area of ongoing research and disagreement.
Upvotes: 3