Artyom
Artyom

Reputation: 31271

Unaligned load/store in gcc vector extension

I need to access unaligned values using GCC vector extension

The program below crashes - in both clang and gcc

typedef int __attribute__((vector_size(16))) int4;
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;

int main()
{
        int v[64] __attribute__((aligned(16))) = {};
        int4p ptr = reinterpret_cast<int4p>(&v[7]);
        int4 val = *ptr;
}

However if I change

typedef int __attribute__((vector_size(16),aligned(4))) *int4p;

to

typedef int __attribute__((vector_size(16),aligned(4))) int4u;
typedef int4u *int4up;

The generated assembly code is correct (using unaligned load) - in both clang and gcc.

What is wrong with single definition or what do I miss? Can it be the same bug in both clang and gcc?

Note: it happens in both clang and gcc

Upvotes: 3

Views: 1534

Answers (1)

Iwillnotexist Idonotexist
Iwillnotexist Idonotexist

Reputation: 13467

TL;DR

You've altered the alignment of the pointer type itself, not the pointee type. This has nothing to do with the vector_size attribute and everything to do with the aligned attribute. It's also not a bug, and it's implemented correctly in both GCC and Clang.

Long Story

From the GCC documentation, § 6.33.1 Common Type Attributes (emphasis added):

aligned (alignment)

This attribute specifies a minimum alignment (in bytes) for variables of the specified type. [...]

The type in question is the type being declared, not the type pointed to by the type being declared. Therefore,

typedef int __attribute__((vector_size(16),aligned(4))) *int4p;

declares a new type T that points to objects of type *T, where:

  • *T is a 16-byte vector with default alignment for its size (16 bytes)
  • T is a pointer type, and the variables of this type may be exceptionally stored aligned to as low as 4-byte boundaries (even though what they point to is a type *T that is far more aligned).

Meanwhile, § 6.49 Using Vector Instructions through Built-in Functions says (emphasis added):

On some targets, the instruction set contains SIMD vector instructions which operate on multiple values contained in one large register at the same time. For example, on the x86 the MMX, 3DNow! and SSE extensions can be used this way.

The first step in using these extensions is to provide the necessary data types. This should be done using an appropriate typedef:

typedef int v4si __attribute__ ((vector_size (16)));

The int type specifies the base type, while the attribute specifies the vector size for the variable, measured in bytes. For example, the declaration above causes the compiler to set the mode for the v4si type to be 16 bytes wide and divided into int sized units. For a 32-bit int this means a vector of 4 units of 4 bytes, and the corresponding mode of foo is V4SI.

The vector_size attribute is only applicable to integral and float scalars, although arrays, pointers, and function return values are allowed in conjunction with this construct. Only sizes that are a power of two are currently allowed.

Demo

#include <stdio.h>

typedef int __attribute__((aligned(128))) * batcrazyptr;
struct batcrazystruct{
    batcrazyptr ptr;
};

int main()
{
    printf("Ptr:    %zu\n", sizeof(batcrazyptr));
    printf("Struct: %zu\n", sizeof(batcrazystruct));
}

Output:

Ptr:    8
Struct: 128

Which is consistent with batcrazyptr ptr itself having its alignment requirement changed, not its pointee, and in agreement with the documentation.

Solution

I'm afraid you'll be forced to use a chain of typedef's, as you have done with int4u. It would be unreasonable to have a separate attribute to specify the alignment of each pointer level in a typedef.

Upvotes: 8

Related Questions