Reputation: 15070
The documentation I've found so far on _mm_malloc()
is quite scarce. Particularly, I can't figure out what will happen if I pass it a size
parameter that is not a multiple of align
. Is it UB? Or will it allocate the number of bytes which is the next larger multiple of align
?
Upvotes: 0
Views: 1087
Reputation: 365247
Intel's documentation for _mm_malloc
in their own compiler only says "This [align] constraint must be a power of two."
There's no requirement that size be a multiple of alignment, because the main use-case for it is SIMD, where it's totally normal to allocate an array with alignment greater than the width of a single member. (e.g. a float*
aligned to 32B for AVX). Or for cache-line / page / hugepage boundaries. e.g. to take better advantage of transparent hugepages, you might allocate with 2MB alignment for any allocation greater than 2MB.
The only aligned allocator I'm aware of that does have the limitation you're worried about is C11 / C++17 aligned_alloc
, which is unfortunately required to fail when size % align != 0
. See my answer on How to solve the 32-byte-alignment issue for AVX load/store operations?. TL;DR: The original C11 aligned_alloc
was UB with non-multiple-of-align sizes, so real implementations chose to make it work as expected like other aligned allocators (e.g. posix_memalign
). But then it was changed to being required to fail (return an error) in that case, instead of UB, so implementations that allowed it to work are technically violating a (stupid) standard. C++17 has the required-to-fail version.
Obviously Intel didn't make the same mistake that the standards committee did with aligned_alloc
, because it would defeat the purpose of _mm_malloc
for optimization. Of course they had the SIMD and memory-boundary use-cases in mind. (IDK how the standards committee didn't, seems totally obvious as the main use case for types/buffers with more alignment than the natural alignment of the widest type. It's really disappointing that the one function with the nicest API is not safe to use. (aligned_alloc
returns memory freeable with free
, and doesn't defeat optimization by taking the address of the pointer as an input like posix_memalign
(which leads to compilers worrying about aliasing).)
Or will it allocate the number of bytes which is the next larger multiple of align?
That might be effectively true for small alignments like 32B or 64B. Depending on the implementation, it might not leave that slack space at the end available for smaller allocations with malloc
or with smaller-alignment calls to _mm_malloc
. It's safe to read up the the alignment boundary without faulting (if it's less than a 4k page), but don't write to it if you didn't explicitly allocate it.
In any good quality implementation, it's exceedingly unlikely that a large alignment will waste multiple whole pages. You could always test by doing many allocations with huge alignments (like _mm_malloc(3M, 2M)
) and some allocations that could use that space (like _mm_malloc(512k, 4k)
), then sleep(100)
. Look at the memory footprint of your process before it exits.
Upvotes: 1
Reputation: 211680
These are two independent factors: size
dictates the raw size, align
is simply the placement of the allocated block. In actual code you might see a correlation, the reason you want something aligned is usually because size
is an even multiple of some factor, but it's not a hard requirement.
You may have a perfectly valid reason for allocating 79 bytes aligned on an 8 byte basis.
Upvotes: 2