wxz
wxz

Reputation: 2546

Confusion about different meanings of "HighMem" in Linux Kernel

I'm trying to understand what "highmem" means but I've seen it used in two different ways and I want to know if one or both are correct.

The two definitions I've gleaned are:

  1. Highmem refers to a specific situation in 32 bit systems, where the system could fit more than 4GB of RAM but 32 bits only allowed the kernel to address 4GB memory directly, so any memory above 4GB needed to use Physical Address Extensions (PAE) and was called "highmem". When I see this version of high memory discussed, usually it's mentioned that 64 bit systems no longer have this problem; they can address their physical memory fully so no notion of "highmem" is needed (see 1, 2, 3). My own 64-bit system doesn't show any highmem memory in /proc/zoneinfo or in free -ml.

  2. Highmem is used to describe the virtual memory address space that is for user-space. This is in contrast with lowmem, the address space that is used for the kernel and is mapped into every user-space program's address space. Another way I've seen this phrased is with the names zone_highmem (highmem) and zone_normal (lowmem). For instance, in the 32-bit system "3/1" user/kernel memory split, the 3GB used for user space would be considered high memory (see 4, 5, 6, 7).

Is one definition more correct than another?

Are both correct but useful in different situations (i.e. definition 1 referring to physical memory, definition 2 referring to virtual memory)?

Upvotes: 2

Views: 2787

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 366066

I think your examples of usage 2 are actually (sometimes mangled) descriptions of usage 1, or its consequences. There is no separate meaning, it's all just things that follow from not having enough kernel virtual address space to keep all the physical memory mapped all the time.

(So with a 3:1 user:kernel split, you only have 1GiB of lowmem, the rest is highmem, even if you don't need to enable PAE paging to see all of it.)

This article https://cl4ssic4l.wordpress.com/2011/05/24/linus-torvalds-about-pae quotes a Linux Torvalds rant about how much it sucks to have less virtual address space than physical (which is what PAE does), with highmem being the way Linux tries to get some use out of the memory it can't keep mapped.

PAE is a 32-bit x86 extension that switches the CPU to using an alternate page-table format with wider PTEs (the same one adopted by AMD64, including an exec permission bit, and room for up to 52-bit physical addresses, although the initial CPUs to support it only supported 36-bit physical addresses). If you use PAE at all, you use it for all your page tables.


A normal kernel using a high-half-kernel memory layout reserves the upper half of virtual address space for itself, even when user-space is running. Or to leave user-space more room, 32-bit Linux moved to a 3G:1G user:kernel split.

See for example modern x86-64 Linux's virtual memory map (Documentation/x86/x86_64/mm.txt), and note that it includes a 64TB direct mapping of all physical memory (using 1G hugepages), so given a physical address, the kernel can access it by adding the phys address to the start of that virtual address. (kmalloc reserves a range of addresses in this region without actually having to modify the page tables at all, just the bookkeeping).

The kernel also wants other mappings of the same pages, for vmalloc kernel memory allocations that are virtually contiguous but don't need to be physically contiguous. And of course for the kernel's static code/data, but that's relatively small.

This is the normal/good situation without any highmem, which also applies to 32-bit Linux on systems with significantly less than 1GiB of physical memory. This is why Linus says:

virtual space needs to be bigger than physical space. Not “as big”. Not “smaller”. It needs to be bigger, by a factor of at least two, and that’s quite frankly pushing it, and you’re much better off having a factor of ten or more.

This is why Linus later says "Even before PAE, the practical limit was around 1GB..." With a 3:1 split to leave 3GB of virtual address space for user-space, that only leaves 1GiB of virtual address space for the kernel, just enough to map most of the physical memory. Or with a 2:2 split, to map all of it and have room for vmalloc stuff.

Hopefully this answer sheds more light on the subject than Linus's amusing Anybody who doesn’t get that is a moron. End of discussion. (From context, he's actually aiming that insult at CPU architects who made PAE, not people learning about OSes, don't worry :P)


So what can the kernel do with highmem? It can use it to hold user-space virtual pages, because the per-process user-space page tables can refer to that memory without a problem.

Many of the times when the kernel has to access that memory are when the task is the current one, using a user pointer. e.g. read/write system calls invoke copy_to/from_user with the user-space address (copying to/from the pagecache for a file read/write), reaching the highmem through the user page table entries.

Unless the data isn't hot in pagecache, then the read will block while DMA from disk (or network for NFS or whatever) is queued up. But that will just bring file data into the pagecache, and I guess the copying into user-owned pages will happen after a context-switch back to the task with the suspended read call.

But what if the kernel wants to swap out some pages from a process that isn't running? DMA works on physical addresses, so it can probably calculate the right physical address, as long as it doesn't need to actually load any of that user-space data.

(But it's usually not that simple, IIRC: DMA devices in 32-bit systems may not support high physical addresses. So the kernel might actually need bounce buffers in lowmem... I concur with Linus: highmem sucked, and using a 64-bit kernel is obviously much better, even if you want to run a pure 32-bit user-space.)

Anything like zswap that compresses pages on the fly, or any driver that does need to copy data using the CPU, would need a virtual mapping of the page it was copying to/from.

Another problem is POSIX async I/O that lets the kernel complete I/O while the process isn't active (and thus its page table isn't in use). Copying from user-space to the pagecache / write buffer can happen right away if there's enough free space, but if not you'd want to let the kernel read pages when convenient. Especially for direct I/O (bypassing pagecache).


Brendan also points out that MMIO (and the VGA aperture) need virtual address space for the kernel to access them; often 128MiB, so your 1GiB of kernel virtual address space is 128MiB of I/O space, 896MiB of lowmem (permanently mapped memory).


The kernel needs lowmem for per-process things including kernel stacks for every task (aka thread), and for page tables themselves. (Because the kernel has to be able to read and modify the page tables for any process efficiently.) When Linux moved to using 8kiB kernel stacks, that meant that it had to find 2 contiguous pages (because they're allocated out of the direct-mapped region of address space). Fragmentation of lowmem apparently was a problem for some people unwisely running 32-bit kernels on big servers with tons of threads.

Upvotes: 2

KMG
KMG

Reputation: 1511

say for example that we have two 32-bit Linux with virtual address space of say 3:1 GB(user:kernel) of the 4 GB available virtual memory address space, and we have two machines which have 512 MB and 2 GB of physical memory respectively.

For the 512 MB physical memory machine the Linux kernel could directly map the whole physical memory in it's kernel space in lowmem virtual region at specific offset of PAGE_OFFSET. No problem at all.

But what about the the 2 GB physical RAM machine we have 2 GB of physical RAM that we want to map into our kernel virtual lowmem region but this is just too big, it just can't fit since max kernel virtual address space is just 1 GB as we said at the beginning. So, to solve this problem Linux directly maps a portion of physical RAM into in's lowmem region (I remember, it's 750 MB but it varies between different arch ) then it set's up a virtual mapping where it can use temporary virtual mappings of that remaining amount of RAM. This is the so-called high-memory region.

This sort of a hack is not needed anymore in 64-bit mode since now we can map 128 TB of memory in lowmem region directly

Finally, this region is completely unrelated to global variable high_memory which just tells you the kernel upper bound of lowmem region. this means that to know amount of RAM in your system calculate difference between PAGE_OFFSET and high_memory global variables to get RAM size in bytes.

Hope it's clear now.

Upvotes: 1

Related Questions