Kroka
Kroka

Reputation: 475

Float point instructions on ARM assembly

I'm trying to create an ARM benchmark that loop over the following instructions (in assembly), alone and in combination:

This is my code for integer operations:

int additions_int(int n) {

    int i, dummyValue = n;

    __asm (
        "MOV R0, #2\n"
        "MOV R1, #6\n"
    );

    for (i = 0; i < n/LOOP_STEP; i++) {

        __asm (
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
        );
    }

    return dummyValue;
}


int multiplications_int(int n) {

    int i, dummyValue=n;

    __asm (
        "MOV R0, #2\n"
        "MOV R1, #6\n"
    );

    for (i = 0; i < n/LOOP_STEP; i++) {

        __asm (

            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"

        );

    }

    return dummyValue;
}

The problem is in the float point operations. I checked this documentation, and I've tryed to do something like this:

float multiplications_fp(int n) {
    int i;
    float fn=n, dummyValue = fn;

    for (i = 0; i < n/LOOP_STEP; i++) {
        __asm (
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
        );
    }

    return dummyValue;
}


float additions_fp(int n) {
    int i;
    float fn=n, dummyValue = fn;

    for (i = 0; i < n/LOOP_STEP; i++) {
        __asm (
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n"
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n"  
        );
    }

    return dummyValue;
}

Compiling with:

arm-linux-gnueabi-gcc -static -march=armv7-a microbenchmark_arm.c -o microbenchmark_arm

I'm getting this error:

Error: selected processor does not support ARM mode `vmul.f32 R0,R0,R1'
Error: selected processor does not support ARM mode `vadd.f32 R0,R0,R1'

Can anyone say me what I'm doing wrong?

Can anyone show me an example of float point additions or multiplications for ARM Cortex-A architecture?

Upvotes: 1

Views: 6951

Answers (1)

Dric512
Dric512

Reputation: 3729

Floating point instructions have a different register bank. For most of the instructions, you cannot share these registers. But this is the same register as for Neon SIMD instructions.

If you want single-precision, you can use:

VMUL.F32 s0, s0, s1

If you want double precision, you can use:

VMUL.F64 d0, d0, d1

Note that the floating-point engine may need to be enabled first if this is not done by the OS.

Upvotes: 4

Related Questions