Atharva Dubey
Atharva Dubey

Reputation: 915

SIMD store intrinsic into std::vector causes Segmentation fault

I have a function which does the following -

#include <iostream>
#include <vector>
#include <immintrin.h>

typedef struct{
    std::vector<double> x;
    std::vector<double> y;

    void reserve(size_t size){
        x.reserve(size);
        y.reserve(size);
    }

    size_t size(){
        return x.size();
    }
} Points2D;


void triangulate_simd(Points2D* points){

__m256d A_avx2 = _mm256_set1_pd(1.5542315);
__m256d B_avx2 = _mm256_set1_pd(0.974578234);
__m256d C_avx2 = _mm256_set1_pd(9.9937);
__m256d D_avx2 = _mm256_set1_pd(8.8773);
__m256d negative_ones = _mm256_set1_pd(-1);

    std::vector<double> X; 
    std::vector<double> Y; 
    std::vector<double> Z; 

    auto X_data = X.data();
    auto Y_data = Y.data();
    auto Z_data = Z.data();

    auto pts_x = points->x.data(); 
    auto pts_y = points->y.data();

    X.reserve(points->x.size()); 
    Y.reserve(points->x.size()); 
    Z.reserve(points->x.size());

    auto q = size_t(points->x.size() / 8);
    //std::cout<<q<<std::endl;

    for(size_t i=0; i < q; i+=4){
       // std::cout<<i<<"  ";

        __m256d pts_x_simd = _mm256_loadu_pd(pts_x + i);
        __m256d pts_y_simd = _mm256_loadu_pd(pts_y + i);
        
          __m256d result_z = _mm256_div_pd(_mm256_mul_pd(D_avx2, negative_ones), 
            _mm256_add_pd(C_avx2, _mm256_add_pd(
                _mm256_mul_pd(B_avx2, pts_y_simd), _mm256_mul_pd(A_avx2, pts_x_simd)
            )));  // Z = -D / (C + B*pt_y + A*pt_b)

        _mm256_storeu_pd(Z_data + i, result_z);
        _mm256_storeu_pd(X_data + i, _mm256_mul_pd(_mm256_loadu_pd(pts_x + i), result_z));
        _mm256_storeu_pd(Y_data + i, _mm256_mul_pd(_mm256_loadu_pd(pts_y + i), result_z));
        
    }

    for(size_t i = q*8; i < points->x.size(); i++){

        Z_data[i] = -D / ((C + B * pts_y[i] + A * pts_x[i]));
        X_data[i] = pts_x[i] * Z_data[i];
        Y_data[i] = pts_y[i] * Z_data[i];
    }

// Do something more

    X.clear(); X.shrink_to_fit();
    Y.clear(); Y.shrink_to_fit();
    Z.clear(); Z.shrink_to_fit();

}

int main(){

    //std::vector<cv::Point2f> input;
    Points2D* point = new Points2D();

    for(size_t i = 0; i < 1e8; i++){
        point->x.push_back(double(rand()));
        point->y.push_back(double(rand()));
    }

    triangulate_simd(point);
}

However, I run into a segmentation fault at _mm256_storeu_pd(Z_data + i, result_z);
I am not sure why does it throw a segmentation fault, I have double checked the ranges and everything. I even set the range range of X,Y and Z vectors to make sure I am not accessing memory which has not been allocated. Could someone please point me in the right direction.

TIA

Upvotes: 0

Views: 301

Answers (1)

BoP
BoP

Reputation: 3150

Z_data holds the data pointer of an empty vector, because you used .data() before .reserve().

This doesn't update if you later change the size (or reserve memory) for the vector. That's why the documentation says reserve invalidates iterators if the new size is greater than capacity() (which was originally 0 in your case). .data() was probably nullptr to start with, and .reserve() allocated some space and updated the internal state that .data() returns.

Also, you are not technically allowed to write to the reserved space anyway, only to as much space as vector.size() indicates, although that often happens to work as a hack. With a custom allocator, you can get std::vector to skip actually writing the memory when increasing size to make it fully guaranteed safe to use the space.

Upvotes: 4

Related Questions