Is TLB used at all in the instruction fetching pipeline

Question

Is a TLB used at all in the instruction fetching pipeline?

Is this architecture / microarchitecture - dependent?

user2467198 · Accepted Answer

Typically, a processor that supports paging (which typically includes a mechanism for excluding execute permission even if not separately from read permission) will access a TLB as part of instruction fetch.

A virtually tagged instruction cache would not require such even for permissions checks as long as permissions were checked when a block is inserted into the instruction cache (which typically would involve a TLB access, though a permission cache could be used with a virtually tagged L2 cache; this includes prefetches into the instruction cache), the permission domain was included with the virtual tag (typically the same as an address space identifier, which is useful anyway to avoid cache flushing), and system software ensured that blocks were removed when execute permission was revoked (or the permission domain/address space identifier was reused for a different permission domain/address space).

(In general, virtually tagged caches do not need a translation lookaside buffer; a cache of permission mappings is sufficient or permissions can be cached with the tag and an indication of the permission domain. Before accessing memory a TLB would be used, but cache hits would not require translation. Permission caching is less expensive than translation caching both because the granularity can be larger and because fewer bits are needed to express permission information.)

A physically tagged instruction cache would require address translation for hit determination, but this can be delayed significantly by speculating that the access was a hit (likely using way prediction). Hit determination can be delayed even to the time of instruction commit/result writeback, though earlier handling is typically better.

Because instruction accesses typically have substantial spatial locality, a very small TLB can provide decent hit rates and a reasonably fast, larger back-up TLB can reduce miss costs. Such a microTLB can facilitate sharing a TLB between data and instruction accesses by filtering out most instruction accesses.

Obviously, an architecture that does not support paging would not use a TLB (though it might use a memory protection unit to check that an access is permitted or use a different translation mechanism such as adding an offset possibly with a bounds check). An architecture oriented toward single address space operating systems would probably use virtually tagged caches and so access a TLB only on cache misses.

Is TLB used at all in the instruction fetching pipeline

Answers (1)

Related Questions