Jamie
Jamie

Reputation: 499

std::copy() vs memcpy() with std::vectors and fftw_malloc

I know when dealing with std::vector, it's best to use std::copy(), however I was wondering how to copy an fftw_malloc array into a std::vector? Should I use std::copy() or memcpy()?

Example code:

static const int nx = 8;
static const int ny = 8;
static const int nyk = ny/2 + 1;
static const int Mx = 3*nx/2, My= 3*ny/2;

double M[2] = {Mx,My};

fftw_complex *array1;
array1= (fftw_complex*) fftw_malloc(nx*nyk* sizeof(fftw_complex)); 
memset(array1, 42, nx*nyk* sizeof(fftw_complex));

std::vector<std::complex<double>> array2(Mx*My,0.0); 

std::copy( array1, array1+(nx*nyk) , array2); //std::copy( src, src + size, dest );

I am trying to copy array1 into array2, where array2 > array1, then would like to use std::vector operators to append values, and so on.

This doesn't really work, and I am not sure if my vector syntax is off, or std::copy() is not the way to go. Should I use memcpy() instead?

Upvotes: 1

Views: 399

Answers (2)

alfC
alfC

Reputation: 16242

These days, it doesn't make a lot of difference to use fftw_malloc vs. using regularly allocated memory for FFTW operations (see timing results at the end).

Having said that, the problem in your code is two-fold. First, the last argument of std::copy needs to be an iterator (e.g. array2.begin()). Second, the value type is incompatible; you are transforming fftw_complex into std::complex<double>, which are different types (albeit they have the same bitwise representation).

So you have several options:

std::copy( reinterpret_cast<std::complex<double>*>(array1), reinterpret_cast<std::complex<double>*>(array1+(nx*nyk)) , array2.begin());

Which, in turn, can be simplified to

std::copy_n( reinterpret_cast<std::complex<double>*>(array1), nx*nyk, array2.begin());

If you don't like to reinterpret_cast you can transform the data.

std::transform( array1, array1 + nx*nyk, array2.begin(), [](auto const& z) {return std::complex<double>(z[0], z[1])});

(there is no std::transform_n).

You can skip the two-step process:

std::vector<std::complex<double>> array2(reinterpret_cast<std::complex<double>*>(array1), reinterpret_cast<std::complex<double>*>(array1+(nx*nyk)));

Honestly, I would try to make everything work with normal allocation. Besides, what do you gain by allocating with fftw_malloc to copy immediately to a regular std::vector, possibly losing the alignment properties? (I hope you are trying to copy to the vector after the FFTW operation.)

The most elegant solution is to have an FFTW-aware allocator altogether. https://gitlab.com/correaa/boost-multi/-/blob/master/include/multi/adaptors/fftw/memory.hpp?ref_type=heads#L27

using Z = std::complex<double>;
std::vector<Z, multi::fftw::allocator<Z>> array2(...);

In this way, you don't have to manage any memory and have the benefits of FFTW allocation, if any.


Finally, std::vector is not even the best container to hold data subject to FFT operation; if nothing else, because most interesting FFTs are multidimensional.

In my array library, FFTW is adapted and can be used in this way:

#include <multi/adaptors/fftw.hpp>
#include <multi/array.hpp>

namespace multi = boost::multi;  // from https://gitlab.com/correaa/boost-multi

int main() {
  multi::array<std::complex<double>, 2, multi::fftw::allocator<> > in({Mx, My});
  ...
  multi::array<std::complex<double>, 2, multi::fftw::allocator<> > out({Mx, My});
  multi::fftw::dft_forward({true, true}, in, out);
  ...
}

(code in this answer is not tested)


Timing results, fftw_malloc vs normal allocation:

I measured, and I don't see any benefit in using fftw_malloc vs. new/std::allocator.

https://gitlab.com/correaa/boost-multi/-/blob/master/include/multi/adaptors/fftw/benchmark/memory.cpp?ref_type=heads#L94

These tests are for 1D and 2D data, run in a Intel® Core™ i7-9750H × 12 (48GB), running in Ubuntu 23.04 with FFTW version libfftw3-mpi3/lunar 3.3.10-1ubuntu1 amd64 and g++ 13 with -O3 -DNDEBUG.

benchmark

Upvotes: 2

Ted Lyngmo
Ted Lyngmo

Reputation: 117148

Background: The standard makes it defined to reinterpret_cast<double*> from a std::complex<double>* and vice versa. The memory layout of a std::complex<double> is guaranteed to be that of two adjacent doubles in memory. Internally, it could be something like any of these:

template<class T>
struct complex1 {
    T m_real;
    T m_img;
};
//---
template<class T>
struct complex2 {
    T m_data[2];
};
//---
template<class T> struct complex3;

template<>
struct complex3<double> {
    double _Complex m_data; // The C complex type
};

std:complex<T> is one of the few (maybe only?) class where you have this guarantee that it's ok to reinterpret_cast a pointer to it and view it as-if through a pointer to another type, back and forth.


However, working with these different types isn't nice and clutters the code with reinterpret_casts so I would start by writing a wrapper, let's call it ComplexArray, around the raw pointers that's returned by fftw_malloc - and I'd use fftw_alloc_complex instead of the more generic fftw_malloc. The wrapper should give you as a C++ programmer the possibility of working with the data so that it can be used with the algorithms in the standard libary of C++.

I'd still use the fftw allocation functions instead of just going for a std::vector<std::complex<double>> directly (which is perfectly fine) because they "guarantee that the returned pointer obeys any special alignment restrictions imposed by any algorithm in FFTW (e.g. for SIMD acceleration)".

By providing this wrapper with iterator support, copying the data into a std::vector<std::complex<double>> becomes just as easy as when working with any C++ container. You can even create the vector directly from your wrapper's iterators.

The target would be the possibility to write your main function like this:

int main() {
    static const int nx = 8;
    static const int ny = 8;
    static const int nyk = ny / 2 + 1;
    static const int Mx = 3 * nx / 2, My = 3 * ny / 2;

    ComplexArray array1(nx * nyk); // the new C++ container

    // no more memset(..., 42, ...), use real C++ std::complex<double>s:
    array1.fill(std::complex<double>{1.4260258159703532e-105,
                                     1.4260258159703532e-105});

    // create a vector from your ComplexArray
    std::vector<std::complex<double>> array2(array1.begin(), array1.end());
}

An example implementation of ComplexArray could look like this and will work as a substitute for std::vector<std::complex<double>> in most situations you want when working with fftw so you may not even need to copy it to a vector and you can add copying and creating it from iterators and other useful things if you need. This is a start:

#include <algorithm>
#include <complex>
#include <iterator>
#include <memory>

class ComplexArray {
public:
    using value_type = std::complex<double>;

    using pointer = value_type*;
    using const_pointer = const value_type*;
    using reference = value_type&;
    using const_reference = const value_type&;

    using iterator = value_type*;
    using const_iterator = const value_type*;
    using reverse_iterator = std::reverse_iterator<iterator>;
    using reverse_const_iterator = std::reverse_iterator<const_iterator>;

    ComplexArray() = default;
    ComplexArray(std::size_t size) : m_data(fftw_alloc_complex(size)) {}

    // move semantics comes for free, copying does not:
    ComplexArray(ComplexArray&&) = default;
    ComplexArray& operator=(ComplexArray&&) = default;

    // misc ...
    std::size_t size() const { return m_size; }
    void fill(const std::complex<double>& val) { std::fill(begin(), end(), val); }

    // access
    const_pointer data() const {
        return reinterpret_cast<const_pointer>(m_data.get());
    }
    pointer data() { return reinterpret_cast<pointer>(m_data.get()); }

    const_reference operator[](std::size_t idx) const { return data()[idx]; }
    reference operator[](std::size_t idx) { return data()[idx]; }

    // functions to provide fftw functions direct access to the data
    const fftw_complex* fftw_data() const { return m_data.get(); }
    fftw_complex* fftw_data() { return m_data.get(); }

    // iterators ...
    const_iterator cbegin() const { return data(); }
    const_iterator cend() const { return data() + m_size * sizeof(value_type); }
    const_iterator begin() const { return cbegin(); }
    const_iterator end() const { return cend(); }
    iterator begin() { return data(); }
    iterator end() { return data() + m_size * sizeof(value_type); }

    // reverse iterators ...
    reverse_const_iterator crbegin() const {
        return reverse_const_iterator(cend());
    }
    reverse_const_iterator crend() const {
        return reverse_const_iterator(cbegin());
    }
    reverse_const_iterator rbegin() const { return crbegin(); }
    reverse_const_iterator rend() const { return crend(); }
    reverse_iterator rbegin() { return reverse_iterator(end()); }
    reverse_iterator rend() { return reverse_iterator(begin()); }

private:
    struct fftw_freer {
        void operator()(fftw_complex* cplx) const noexcept { fftw_free(cplx); }
    };
    std::unique_ptr<fftw_complex, fftw_freer> m_data;
    std::size_t m_size = 0;
};

Upvotes: 3

Related Questions