user20790983
user20790983

Reputation:

How would I define the __m256i data type in Ada?

I am trying to write a library for AVX2 in Ada 2012 using the GNAT GCC compiler. I have currently defined a data type Vec_256_Integer_32 like so:

type Vector_256_Integer_32 is array (0 .. 7) of Integer_32;
pragma Pack(Vec_256_Integer_32);

Note that I have aligned the array according to the 32 byte boundary indicated in Intel's documentation of the _mm256_load_si256 intrinsic function from immintrin.h.

I would like to implement an operation that adds two of these arrays together using AVX2. The function prototype is as follows.

function Vector_256_Integer_32_Add (Left, Right : Vector_256_Integer_32) return Vector_256_Integer_32

My idea for implementing this function is to do this in three steps.

  1. Load a and b using _mm256_load_si256 into a local variable.
  2. Perform the addition operation using _mm256_add_epi32.
  3. Convert the result back into the Vec_256_Unsigned_32 type using _mm256_store_si256.

Where I am confused is how I would create the __m256i data type in Ada to hold the intermediate results. Can someone please shed some light on this? Additionally, if you see any issues with my approach, any feedback is appreciated.

I have found the definition of __m256i in GCC (located at gcc/gcc/config/i386/avxintrin.h).

typedef long long __m256i __attribute__ ((__vector_size__ (32), __may_alias__));

However, here is where I am stuck as I am not sure how I would transfer this to Ada code. I have found that the __vector_size__ attribute is documented here.

Upvotes: 4

Views: 727

Answers (2)

user20790983
user20790983

Reputation:

I figured out the answer to my question after doing more research. Thank you for your input. I am posting this so hopefully someone else can get value from this.

Edit: I have adjusted my answer according to feedback from the commenter Peter Cordes.

For example, if you want to define a data type of 8 32-bit signed integers, you would write

type Vector_256_Integer_32 is array (0 .. 7) of Integer_32 with Convention => C, Alignment => 32;

The function to add the two vectors together would be defined as

function "+" (Left, Right: Vector_256_Integer_32) return Vector_256_Integer_32;
pragma Import (Intrinsic, "+", "__builtin_ia32_paddd256");

Note that I am using the GCC intrinsic, rather than the intrinsics from immintrin.h (because I am not aware how to import an intrinsic from that header file).

The documentation of _mm256_add_epi32 states that the vpaddd instruction is used. The GCC __builtin_ia32_paddd256 appears to translate to this instruction.

Below is an example Ada program and ads file.

avx2.ads

with Interfaces; use Interfaces;

package AVX2 is

   --
   -- Type Definitions
   --

   -- 256-bit Vector of 32-bit Signed Integers
   type Vector_256_Integer_32 is array (0 .. 7) of Integer_32;
   for Vector_256_Integer_32'Alignment use 32;
   pragma Machine_Attribute (Vector_256_Integer_32, "vector_type");
   pragma Machine_Attribute (Vector_256_Integer_32, "may_alias");

   --
   -- Function Definitions
   --

   -- Function: 256-bit Vector Addition of 32-bit Signed Integers
   function Vector_256_Integer_32_Add
     (Left, Right : Vector_256_Integer_32) return Vector_256_Integer_32 with
     Convention    => Intrinsic, Import => True,
     External_Name => "__builtin_ia32_paddd256";

end AVX2;

main.adb

with AVX2;        use AVX2;
with Interfaces;  use Interfaces;
with Ada.Text_IO; use Ada.Text_IO;

procedure Main is
   a, b, r : Vector_256_Integer_32;
begin
   for i in Vector_256_Integer_32'Range loop
      a (i) := 5 * (Integer_32 (i) + 5);
      b (i) := 12 * (Integer_32 (i) + 12);
   end loop;
   r := Vector_256_Integer_32_Add(a, b);
   for i in Vector_256_Integer_32'Range loop
      Put_Line
        ("r(i) = a(i) + b(i) = " & a (i)'Image & " + " & b (i)'Image & " = " &
         r (i)'Image);
   end loop;
end Main;

Here is an equivalent program in C. Note that this code has only been tested in GCC and is not necessarily the most efficient.

#include <stdio.h>
#include <immintrin.h>
#include <stdint.h>

int main()
{
    __m256i ma;
    __m256i mb;
    __m256i mr;
    int32_t a[8] __attribute__((aligned(32)));
    int32_t b[8] __attribute__((aligned(32)));
    int32_t r[8] __attribute__((aligned(32)));

    for (int i = 0; i < 8; ++i) {
        a[i] = 5 * (i + 5);
        b[i] = 12 * (i + 12);
    }

    ma = _mm256_load_si256((void *const)a);
    mb = _mm256_load_si256((void *const)b);

    mr = _mm256_add_epi32(ma, mb);

    _mm256_store_si256((void *)r, mr);

    for (int i = 0; i < 8; ++i) {
        printf("r[i] = a[i] + b[i] = %d + %d = %d\n", a[i], b[i], r[i]);
    }
}

Upvotes: 2

Jeffrey R. Carter
Jeffrey R. Carter

Reputation: 3358

The first thing you need to do is to learn Ada, since 2/3 of your Ada declarations are invalid. Pragma Pack has only one argument (GNAT does not have an implementation-dependent version with two), and in Ada 12 you should usually use the aspect rather than the pragma. Alignment is specified otherwise. Ada does not have "function prototypes". The function declaration for your addition operation should be

function "+" (Left : in Vec_256_Unsigned_32; Right : in Vec_256_Unsigned_32) return Vec_256_Unsigned_32;

Ada has operator overloading and packages for encapsulation and namespace control, so you don't need prefixes on everything as you do in languages lacking these essential features.

IIUC, the C definition of __m256i defines an array of long long that occupies 32 bytes. Since Interfaces.C doesn't define an equivalent for long long, the Ada equivalent depends on the size of long long. If it's 64 bits, then it's equivalent to Interfaces.Integer_64, which is 8 bytes, so the Ada equivalent would be

type M256i is array (1 .. 4) of Interfaces.Integer_64 with Convention => C;

(Anything you pass to a C subprogram should be defined in Interfaces.C or its children, or declared with convention C.)

Since both M256i and Vec_256_Unsigned_32 are 32 bytes, you can convert between them using instances of Ada.Unchecked_Conversion.

Upvotes: 1

Related Questions