alco
alco

Reputation: 1

Access violation when performing matrix product using SIMD in Rust

I'm making my own linalg library for my opengl project, and was thinking of accelerating matmul using simd.

minimal reproducible example:

use std::arch::x86_64::*;

#[derive(Debug, Clone, Copy)]
struct Mat4f {pub data: [[f32; 4]; 4]}

#[repr(align(16))]
struct AlignedF32([f32; 4]);

impl Mat4f {
    pub fn new() -> Self {
        Self {data: [[0_f32; 4]; 4]}
    }
}

#[cfg(all(
    any(target_arch = "x86", target_arch = "x86_64"),
    target_feature = "sse"
))]
impl Mat4f {
    pub fn mm_simd(&self, other: &Self) -> Self {
        let a = &self.data; // transposing is omitted here
        let b = &other.data;
        let mut buffer = AlignedF32([0_f32; 4]);
        let mut product: Mat4f = Mat4f::new();
        unsafe {
            for i in 0..4 {
                for j in 0..4 {
                    let m1 = _mm_mul_ps(_mm_load_ps(b[i].as_ptr()), _mm_load_ps(a[j].as_ptr()));
                    let m2 = _mm_hadd_ps(m1, m1);
                    let m3 = _mm_hadd_ps(m2, m2);
                    _mm_store_ps(buffer.0.as_mut_ptr(), m3);
                    product.data[i][j] = buffer.0[0];
                }
            }
        }
        product
    }
}

fn main() {
    let a: Mat4f = Mat4f::new();
    let b: Mat4f = a;
    println!("{:?}", a.mm_simd(&b));
    
    // crashes if you run the following line
    // for i in 0..10{}
}

a.mm_simd(&b) causes access violation when called either inside a loop, before a loop or after a loop, throwing (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION).

not sure if simd is really necessary for mat4x4 product. it haunts me though that an additional line that does nothing actually causes the whole thing to crash.

Upvotes: 0

Views: 82

Answers (1)

cafce25
cafce25

Reputation: 27186

There is no alignment enforced on Mat4f, so calling _mm_load_ps(b[i].as_ptr()) about which the docs tell us (emphasis mine):

Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

But it's argument is pointing to a slice contained in an unaligned1 Mat4f so it is invalid and invokes UB. That means while it might work as expected (when Mat4f happens to be properly aligned), it also can error out.

it haunts me though that an additional line that does nothing actually causes the whole thing to crash

Yes, that's the nature of UB, it might be affected by completely unrelated code.


1) aligned only to the requirements of f64, not to a 16-byte boundary

Upvotes: 3

Related Questions