How much data is loaded in to the L2 and L3 caches?

Question

If I have this class:

class MyClass{
    short a;
    short b;
    short c;
};

and I have this code performing calculations on the above:

std::vector vec;
//
for(auto x : vec){
    sum = vec.a * (3 + vec.b) / vec.c;
}

I understand the CPU only loads the very data it needs from the L1 cache, but when the L1 cache retrieves data from the L2 cache it loads a whole "cache line" (which could include a few bytes of data it doesn't need).

How much data does the L2 cache load from the L3 cache, and the L3 cache load from main memory? Is it defined in terms of pages and if so, how would this answer differ according to different L2/L3 cache sizes?

user2467198 · Accepted Answer

L2 and L3 caches also have cache lines that are smaller than a virtual memory system page. The size of L2 and L3 cache lines is greater than or equal to the L1 cache line size, not uncommonly being twice that of the L1 cache line size.

For recent x86 processors, all caches use the same 64-byte cache line size. (Early Pentium 4 processors had 64-byte L1 cache lines and 128-byte L2 cache lines.)

IBM's POWER7 uses 128-byte cache blocks in L1, L2, and L3. (However, POWER4 used 128-byte blocks in L1 and L2, but sectored 512-byte blocks in the off-chip L3. Sectored blocks provide a valid bit for subblocks. For L2 and L3 caches, sectoring allows a single coherence size to be used throughout the system.)

Using a larger cache line size in last level cache reduces tag overhead and facilitates long burst accesses between the processor and main memory (longer bursts can provide more bandwidth and facilitate more extensive error correction and DRAM chip redundancy), while allowing other levels of cache and cache coherence to use smaller chunks which reduces bandwidth use and capacity waste. (Large last level cache blocks also provide a prefetching effect whose cache polluting issues are less severe because of the relatively high capacity of last level caches. However, hardware prefetching can accomplish the same effect with less waste of cache capacity.) With a smaller cache (e.g., typical L1 cache), evictions happen more frequently so the time span in which spatial locality can be exploited is smaller (i.e., it is more likely that only data in one smaller chunk will be used before the cache line is evicted). A larger cache line also reduces the number of blocks available, in some sense reducing the capacity of the cache; this capacity reduction is particularly problematic for a small cache.

How much data is loaded in to the L2 and L3 caches?

Answers (2)

Related Questions