Reputation:
I have question to the following task.
Consider an IA-32 system where the MMU supports a two level page table. The second
level contains 1024 page table entries mapping to 4 KB page frames. Each page table
entry (both levels) has a size of 4 bytes. The system only supports 4 KB page size.
We want to sequentially read consecutive 8 MB from virtual memory, starting with byte 0. We read one word at a time (4 bytes)
We have an 8 entry data TLB. How many memory accesses are needed to
read the 8 MB of memory specified above?
Does it make a difference, if the TLB has 4 entries instead of 8?
So, we read sequentially. This means 8MB/4B = 2M memory accesses. We have a two level page table. Therefore, 2M + 2*2M = 6M memory accesses without TLB.
But I don't know how to calculate the memory accesses including a TLB.
Could anyone explain me that? That would be very helpful.
Upvotes: 0
Views: 1129
Reputation:
Since the access pattern is a streaming access, each TLB entry will be used for one access to each four bytes for the entire page and never re-used. This means that each TLB entry will be reused 1023 times, so 1023 look-ups (2046 memory accesses) would be avoided per page. (Since there is no overlap of use of different translations and only perfectly localized reuse, a single entry data TLB would have equivalent performance to even a 2048-entry TLB.)
Consider the following description of what is happening for a two-entry direct-mapped data TLB (recognizing that the least significant 12 bits of the virtual address—the offset within the page—are ignored for the TLB and one bit of the virtual address is used to index into the TLB):
load 0x0100_0000; // TLB entry 0 tag != 0x0800 (page # 0x0_1000) [miss]
// 2 memory accesses to fill TLB entry 0
load 0x0100_0004; // TLB entry 0 tag == 0x0800 [hit]
load 0x0100_0008; // TLB entry 0 tag == 0x0800 [hit]
... // 1020 TLB hits in TLB entry 0
load 0x0100_0ffc; // TLB entry 0 tag == 0x0800 [hit]; last word in page
load 0x0100_1000; // TLB entry 1 tag != 0x0800 (page # 0x0_1001) [miss]
// 2 memory accesses to fill TLB entry 1
load 0x0100_1004; // TLB entry 1 tag == 0x0800 [hit]
load 0x0100_1008; // TLB entry 1 tag == 0x0800 [hit]
... // 1020 TLB hits in TLB entry 1
load 0x0100_1ffc; // TLB entry 1 tag == 0x0800 [hit]; last word in page
load 0x0100_2000; // TLB entry 0 tag (0x0800) != 0x0801 (page # 0x0_1002) [miss]
// 2 memory accesses to fill TLB entry 0
load 0x0100_2004; // TLB entry 0 tag == 0x0801 [hit]
load 0x0100_2008; // TLB entry 0 tag == 0x0801 [hit]
... // 1020 TLB hits in TLB entry 0
load 0x0100_2ffc; // TLB entry 0 tag == 0x0801 [hit]; last word in page
load 0x0100_3000; // TLB entry 1 tag (0x0800) != 0x0801 (page # 0x0_1003) [miss)
// 2 memory accesses to fill TLB entry 1
load 0x0100_3004; // TLB entry 1 tag == 0x0801 [hit]
load 0x0100_3008; // TLB entry 1 tag == 0x0801 [hit]
... // 1020 TLB hits in TLB entry 1
load 0x0100_3ffc; // TLB entry 1 tag == 0x0801 [hit]; last word in page
... // repeat the above 510 times
// then the last 4 pages of the 8 MiB stream
load 0x017f_c000; // TLB entry 0 tag (0x0bfd) != 0x0bfe (page # 0x0_17fc) [miss]
// 2 memory accesses to fill TLB entry 0
load 0x017f_c004; // TLB entry 0 tag == 0x0bfe [hit]
load 0x017f_c008; // TLB entry 0 tag == 0x0bfe [hit]
... // 1020 TLB hits in TLB entry 0
load 0x017f_cffc; // TLB entry 0 tag == 0x0bfe [hit]; last word in page
load 0x017f_d000; // TLB entry 1 tag (0x0bfd) != 0x0bfe (page # 0x0_17fd) [miss]
// 2 memory accesses to fill TLB entry 1
load 0x017f_d004; // TLB entry 1 tag == 0x0bfe [hit]
load 0x017f_d008; // TLB entry 1 tag == 0x0bfe [hit]
... // 1020 TLB hits in TLB entry 1
load 0x017f_dffc; // TLB entry 1 tag == 0x0bfe [hit]; last word in page
load 0x017f_e000; // TLB entry 0 tag (0x0bfe) != 0x0bff (page # 0x0_17fe) [miss]
// 2 memory accesses to fill TLB entry 0
load 0x017f_e004; // TLB entry 0 tag == 0x0bff [hit]
load 0x017f_e008; // TLB entry 0 tag == 0x0bff [hit]
... // 1020 TLB hits in TLB entry 0
load 0x017f_effc; // TLB entry 0 tag == 0x0bff [hit]; last word in page
load 0x017f_f000; // TLB entry 1 tag (0x0bfe) != 0x0bff (page # 0x0_17ff) [miss]
// 2 memory accesses to fill TLB entry 1
load 0x017f_f004; // TLB entry 1 tag == 0x0bff [hit]
load 0x017f_f008; // TLB entry 1 tag == 0x0bff [hit]
... // 1020 TLB hits in TLB entry 1
load 0x017f_fffc; // TLB entry 1 tag == 0x0bff [hit]; last word in page
Each page is referenced 1024 times (once for each four byte element) in sequence and then is never referenced again.
(Now consider a design with four TLB entries and two entries caching page directory entries [each of which has the pointer to the page of page table entries]. Each cached PDE will be reused for 1023 page look-ups, reducing them to one memory access each. [If the 8 MiB streaming access was repeated as an inner loop and was 4 MiB aligned, a two-entry PDE cache would be fully warmed up after the first iteration and all subsequent page table look-ups would only require one memory reference.])
Upvotes: 1