Reputation: 6464
Machine has 4 Numa nodes and is booted with kernel boot parameter default_hugepagesz=1G
. I start VM with libvirt/virsh, and I can see that qemu
launches with -m 65536 ... -mem-prealloc -mem-path /mnt/hugepages/libvirt/qemu
, i.e. start virtual machine with 64GB of memory and request it to allocate the guest memory from a temporarily created file in /mnt/hugepages/libvirt/qemu:
% fgrep Huge /proc/meminfo
AnonHugePages: 270336 kB
ShmemHugePages: 0 kB
HugePages_Total: 113
HugePages_Free: 49
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 118489088 kB
%
% numastat -cm -p `pidof qemu-system-x86_64`
Per-node process memory usage (in MBs) for PID 3365 (qemu-system-x86)
Node 0 Node 1 Node 2 Node 3 Total
------ ------ ------ ------ -----
Huge 29696 7168 0 28672 65536
Heap 0 0 0 31 31
Stack 0 0 0 0 0
Private 4 9 4 305 322
------- ------ ------ ------ ------ -----
Total 29700 7177 4 29008 65889
...
Node 0 Node 1 Node 2 Node 3 Total
------ ------ ------ ------ ------
MemTotal 128748 129017 129017 129004 515785
MemFree 98732 97339 100060 95848 391979
MemUsed 30016 31678 28957 33156 123807
...
AnonHugePages 0 4 0 260 264
HugePages_Total 29696 28672 28672 28672 115712
HugePages_Free 0 21504 28672 0 50176
HugePages_Surp 0 0 0 0 0
%
This output confirms that host's memory of 512GB is equally split across the numa
nodes, and hugepages are also equally distributed across the nodes.
The question is how does qemu (or kvm?) determine how many hugepages
to allocate? Note that libvirt
xml has the following directive:
<memoryBacking>
<hugepages/>
<locked/>
</memoryBacking>
However, it is unclear from https://libvirt.org/formatdomain.html#memory-tuning what are defaults for hugepage allocation and on which nodes? Is it possible to have all memory for VM allocated from node 0? What is the right way doing this?
UPDATE
Since my VM
workload is actually pinned to a set of cores on a single numa node 0 using <vcpupin>
element, I thought it'd be good idea to enforce Qemu to allocate memory from the the same numa node:
<numtune>
<memory mode="strict" nodeset="0">
</numtune>
However this didn't work, qemu returned error in its log:
os_mem_prealloc insufficient free host memory pages available to allocate guest ram
Does it mean it fails to find free huge pages on the numa node 0?
Upvotes: 1
Views: 3331
Reputation: 1853
Does it mean it fails to find free huge pages on the numa node 0?
Yes, it does.
numastat -m
can be used to find out how many Huge Pages are there totally, free.
Upvotes: 0
Reputation: 2816
If you use a plain <hugepages/>
element, then libvirt will configure QEMU to allocate from the default huge page pool. Given your 'default_hugepagesz=1G' that should mean that QEMU allocates 1 GB sized pages. QEMU will allocate as many as are needed to satisfy the request RAM size. Given your configuration, these huge pages can potentially be allocated from any NUMA node.
With more advanced libvirt configuration it is possible to request allocation of a specific size of huge page, and pick them from specific NUMA nodes. The latter is only really needed if you are also locking CPUs to a specific host NUMA node.
Upvotes: 2