Joker
Joker

Reputation: 263

Using inline assembler in iOS aarch64 application

I tried to compile some inline assembler for 64-bit iOS application.

Here's an example:

   int roundff(float _value) {
       int res;
       float temp;
       asm("vcvtr.s32.f32 %[temp], %[value] \n vmov %[res], %[temp]" : [res] "=r" (res), [temp] "=w" (temp) : [value] "w" (_value));
       return res;
   }

and I have this errors:

Unrecognized instruction mnemonic.

But this code compiles fine:

__asm__ volatile(
                 "add %[result], %[b], %[a];"
                 : [result] "=r" (result)
                 : [a] "r" (a), [b] "r" (b), [c] "r" (c)
                 );

Than I founded that in aarch64 I have to use fcvt instead of vcvt. Because

int a = (int)(10.123);

compiles into

fcvtzs w8, s8

but I don't know how to write it in inline assembler. Something like this

int roundff(float _value)
{
    int res;
    asm("fcvtzs %[res], %[value]" : [res] "=r" (res) : [value] "w" (_value));
    return res;
}

also doesn't work and generates this errors:

Instruction 'fcvtz' can not set flags, but 's' suffix specified.

Invalid operand for instruction.

Also I need round instead of trim. (fcvtns)

Any help? Where I can read something more about arm(32/64) asm?

UPDATE Ok. This: float res = nearbyintf(v) compiles into nice instruction frinti s0 s0. But why my inline assembler does not work on iOS using clang compiler?

Upvotes: 1

Views: 1720

Answers (2)

Anders Cedronius
Anders Cedronius

Reputation: 2076

Here is how you do it:

-(int) roundff:(float)a {
    int y;
    __asm__("fcvtzs %w0, %s1\n\t" : "=r"(y) : "w"(a));
    return y;
}

Take care,

/A

Upvotes: 3

Peter Cordes
Peter Cordes

Reputation: 363932

You can get the rounding you want using standard math.h functions that inline to single ARM instructions. Better yet, the compiler knows what they do, so may be able to optimize better by e.g. proving that the integer can't be negative, if that's the case.

Check godbolt for the compiler output:

#include <math.h>

int truncate_f_to_int(float v)
{
  int res = v;  // standard C cast: truncate with fcvtzs on ARM64
  // AMD64: inlines to cvtTss2si rax, xmm0   // Note the extra T for truncate
  return res;
}

int round_f_away_from_zero(float v)
{
    int res = roundf(v);  // optimizes to fcvtas on ARM64
  // AMD64: AND/OR with two constants before converting with truncation
    return res;
}


//#define NOT_ON_GODBOLT
// godbolt has a broken setup and gets x86-64 inline asm for lrintf on ARM64

#if defined(NOT_ON_GODBOLT) || defined (__x86_64__) || defined(__i386__)
int round_f_to_even(float v)
{
  int res =  lrintf(v);  // should inline to a convert using the current rounding mode
  // AMD64: inlines to cvtss2si rax, xmm0
  // nearbyintf(v); // ARM64: calls the math library
  // rintf(v); // ARM64: calls the math library
  return res;
}
#endif

godbolt has a buggy install of headers for non-x86 architectures: they still uses x86 math headers, including inline asm.

Also, your roundff function with inline asm for fcvtzs compiled just fine on godbolt with gcc 4.8. Maybe you were trying to build for 32bit ARM? But like I said, use the library function that does what you want, then check to make sure it inlines to nice ASM.

Upvotes: 2

Related Questions