Mark
Mark

Reputation: 6464

libvirt: use of hugepages on NUMA system

Machine has 4 Numa nodes and is booted with kernel boot parameter default_hugepagesz=1G. I start VM with libvirt/virsh, and I can see that qemu launches with -m 65536 ... -mem-prealloc -mem-path /mnt/hugepages/libvirt/qemu, i.e. start virtual machine with 64GB of memory and request it to allocate the guest memory from a temporarily created file in /mnt/hugepages/libvirt/qemu:

% fgrep Huge /proc/meminfo
AnonHugePages:    270336 kB
ShmemHugePages:        0 kB
HugePages_Total:     113
HugePages_Free:       49
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        118489088 kB
%
% numastat -cm -p `pidof qemu-system-x86_64`
Per-node process memory usage (in MBs) for PID 3365 (qemu-system-x86)
         Node 0 Node 1 Node 2 Node 3 Total
         ------ ------ ------ ------ -----
Huge      29696   7168      0  28672 65536
Heap          0      0      0     31    31
Stack         0      0      0      0     0
Private       4      9      4    305   322
-------  ------ ------ ------ ------ -----
Total     29700   7177      4  29008 65889
...
                 Node 0 Node 1 Node 2 Node 3  Total
                 ------ ------ ------ ------ ------
MemTotal         128748 129017 129017 129004 515785
MemFree           98732  97339 100060  95848 391979
MemUsed           30016  31678  28957  33156 123807
...
AnonHugePages         0      4      0    260    264
HugePages_Total   29696  28672  28672  28672 115712
HugePages_Free        0  21504  28672      0  50176
HugePages_Surp        0      0      0      0      0
%

This output confirms that host's memory of 512GB is equally split across the numa nodes, and hugepages are also equally distributed across the nodes.

The question is how does qemu (or kvm?) determine how many hugepages to allocate? Note that libvirt xml has the following directive:

<memoryBacking>
   <hugepages/>
   <locked/>
</memoryBacking>

However, it is unclear from https://libvirt.org/formatdomain.html#memory-tuning what are defaults for hugepage allocation and on which nodes? Is it possible to have all memory for VM allocated from node 0? What is the right way doing this?

UPDATE Since my VM workload is actually pinned to a set of cores on a single numa node 0 using <vcpupin> element, I thought it'd be good idea to enforce Qemu to allocate memory from the the same numa node:

<numtune>
   <memory mode="strict" nodeset="0">
</numtune>

However this didn't work, qemu returned error in its log:

os_mem_prealloc insufficient free host memory pages available to allocate guest ram

Does it mean it fails to find free huge pages on the numa node 0?

Upvotes: 1

Views: 3331

Answers (2)

poige
poige

Reputation: 1853

Does it mean it fails to find free huge pages on the numa node 0?

Yes, it does.

numastat -m can be used to find out how many Huge Pages are there totally, free.

Upvotes: 0

DanielB
DanielB

Reputation: 2816

If you use a plain <hugepages/> element, then libvirt will configure QEMU to allocate from the default huge page pool. Given your 'default_hugepagesz=1G' that should mean that QEMU allocates 1 GB sized pages. QEMU will allocate as many as are needed to satisfy the request RAM size. Given your configuration, these huge pages can potentially be allocated from any NUMA node.

With more advanced libvirt configuration it is possible to request allocation of a specific size of huge page, and pick them from specific NUMA nodes. The latter is only really needed if you are also locking CPUs to a specific host NUMA node.

Upvotes: 2

Related Questions