GPU Memory Read Instruction Flow, Operand collector

Question

I am trying to learn the architecture of a GPU with GPGPU-Sim and I am confused with the flow of memory operations. Lets say I have arithmetic instruction like a = b + c. Before doing the calculation, memory load operations are required for b and c. Load instructions for these are sent to the memories.First of all cache tags are checked.

In case of a miss, the request is being added to MSHR and sent to the lower memory via interconnection network from gpu cores. When the request returns to the core from interconnection network, it is added to a some kind of memory response fifo. Then cache lines are filled by ejecting those requests from the response fifo.

In case of a hit, data are available at cache.

In both cases, our data for arithmetic instruction units are available in caches. I know that operand collector collects required operands for issuing warps, but the part confuses me is where does the operand collector collects those operands from? Per thread registers? If so, when do these registers get required data from caches?

menderft · Accepted Answer

Found the answer. One memory request response from memory response fifo is popped each cycle when the fifo is not empty and writeback stage is not stalled.The popped memory request response gets written to the single ported register file banks. SIMD execution units load required registers for arithmetic instructions from those register file banks when needed. Information about operand collector and those register file banks are available online and pantented by NVIDIA.

GPU Memory Read Instruction Flow, Operand collector

Answers (1)

Related Questions