Reputation: 1010
The netlib documentation of sgemm states that the array strides LDA
and LDB
must be >= 1
, and large enough so that columns don't overlap. Indeed, the implementation in Apple's Accelerate/veclib framework checks these conditions and exists if they are violated.
I really don't understand why this limitation exists. Can't BLAS simply believe me that I really want a stride of zero, or a negative stride? As far as I understand, Fortran integers are signed by default, so parameter types don't seem to be the reaons (disclaimer: I don't know Fortan very well).
Indeed, there exist very reasonable use cases of non-positive array strides:
I am interested in two aspects:
Upvotes: 4
Views: 685
Reputation: 16262
EDIT 2024:
It turns out that axpy
was a bad example.
The answer is that more general BLAS functions do not generally accept stride == 0 arrays.
See bellow for details.
ANSWER from 2023 (outdated):
Regarding the case of zero strides: I am using a C++ multidimensional array library (disclaimer: my own) that supports broadcasting and BLAS interfaces. I can try many cases quickly to see what BLAS versions support zero strides.
Here is the example of axpy
; in Linux at least, all the versions of BLAS I have available support zero strides (broadcasted arrays).
#include<multi/array.hpp> // from https://gitlab.com/correaa/boost-multi
#include<multi/adaptors/blas.hpp>
namespace multi = boost::multi;
int main() {
multi::array<double, 1> y = {1.0, 2.0, 3.0};
// regular arrays, stride != 0
multi::array<double, 1> const x = {1.0, 1.0, 1.0};
multi::blas::axpy(1.0, x.begin(), y);
assert(y[0] == 2.0);
assert(y[1] == 3.0);
assert(y[2] == 4.0);
// broadcasted array of 0D into 1D, stride == 0
multi::array<double, 0> const x0(1.0);
// x0.broadcasted() is {..., 1.0, 1.0, ...}
multi::blas::axpy(1.0, x0.broadcasted().begin(), y);
assert(y[0] == 3.0);
assert(y[1] == 4.0);
assert(y[2] == 5.0);
}
The program compiles and runs correctly with BLAS (netlib), openBLAS, MKL, NVHPC (PGI), ATLAS and BLIS. (Unfortunately, I don't have a Mac device to Apple libs.)
The good news is that none of these libraries check for zero strides and give the correct answer, at least for axpy
.
c++ axpy.cpp -Iinclude/ `pkg-config --libs blas` && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs openblas` && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs mkl-static-ilp64-seq` && ./a.out
c++ axpy.cpp -Iinclude/ /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/lib/libblas.so && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs blas-atlas` && ./a.out
c++ axpy.cpp -Iinclude/ /usr/lib/x86_64-linux-gnu/blis64-openmp/libblas64.so && ./a.out
EDIT 2024:
It turns out that axpy
was a bad example.
When I try to do the same passing a (constant) array with stride 0 to GEMV, all the implementations of BLAS that I have access to give an error message and return 0 values in the destination.
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs blas` && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs openblas` && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs blas-atlas` && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp /usr/lib/x86_64-linux-gnu/blis-openmp/libblas.so && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs mkl-static-ilp64-seq` && ./a.out
Intel oneMKL ERROR: Parameter 8 was incorrect on entry to DGEMV .
/opt/nvidia/hpc_sdk/Linux_x86_64/24.7/compilers/bin/nvc++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp /opt/nvidia/hpc_sdk/Linux_x86_64/24.7/compilers/lib/libblas_lp64.so.0 && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
Of course, parameter number 8 is incx
, the stride of the input vector. https://www.netlib.org/lapack/explore-html/d7/dda/group__gemv.html
Even in Apple Mac the test fail (hey! at least I learned that in Mac, BLAS uses cblas). (It also terminates immediately in Mac, which I like)
BLAS error: Parameter number 9 passed to cblas_dgemv had an invalid value
This is the code to test. Note that for the version with stride 1 there is no problem.
multi::array<double, 2> const a = {
{1.0, 2.0, 3.0},
{4.0, 5.0, 6.0}
};
{
multi::array<double, 1> const ones({3}, 1.0);
assert( ones.stride() == 1 );
assert( ones[0] == 1.0 );
assert( ones[1] == 1.0 );
assert( ones[2] == 1.0 );
multi::array<double, 1> sum_by_rows({2}, 0.0);
blas::gemv_n(1.0, a.begin(), 2, ones.begin(), 0.0, sum_by_rows.begin());
std::cout << sum_by_rows[0] << " " << sum_by_rows[1] << "\n";
BOOST_TEST( std::abs( sum_by_rows[0] - (1.0 + 2.0 + 3.0)) < 1.0e-8 );
BOOST_TEST( std::abs( sum_by_rows[1] - (4.0 + 5.0 + 6.0)) < 1.0e-8 );
}
// the code below simulates the same operation with an array of stride zero
{
multi::array<double, 0> const one(1.0);
auto const& ones = one.broadcasted(); // BLAS doesn't work with stride zero
assert( ones.stride() == 0 );
assert( ones[0] == 1.0 );
assert( ones[1] == 1.0 );
assert( ones[2] == 1.0 );
multi::array<double, 1> sum_by_rows({2}, 0.0);
blas::gemv_n(1.0, a.begin(), 2, ones.begin(), 0.0, sum_by_rows.begin());
std::cout << sum_by_rows[0] << " " << sum_by_rows[1] << "\n";
assert( std::abs( sum_by_rows[0] - (1.0 + 2.0 + 3.0)) < 1.0e-8 );
assert( std::abs( sum_by_rows[1] - (4.0 + 5.0 + 6.0)) < 1.0e-8 );
}
I am going to archive the test here: https://gitlab.com/correaa/boost-multi/-/blob/master/include/boost/multi/adaptors/blas/test/gemv.cpp
Upvotes: 2
Reputation: 16262
I recently found myself effectively doing this:
double s[4] = {10., 2., 3.1, 4.1};
dscal_(4, 3., s, -1);
assert( s[1] == 2.*3. );
dscal_
is the simplest BLAS function, multiplying an array by a scalar, its signature is:
void sscal(int, double, double*, int); // doesn't seem to return anything useful
In my particular distribution of BLAS (shipped with Fedora 28) this gave a silent error, as the function doesn't seem to do anything.
Moreover, dscal_
doesn't even seem to return an error code, so there is no way I can catch this error without a wrapping function (my arrays have runtime positive or negative strides but I cannot control the value in all cases).
All these cases failed:
double s[4] = {10., 2., 3.1, 4.1};
dscal_(4, 3., s, -1); // negative incx does nothing
dscal_(4, 3., &s[4], -1); // negative incx does nothing
dscal_(-4, 3., &s[4], 1); // negative n does nothing
dscal_(-4, 3., &s[4], -1); // negative n or incx does nothing
assert( s[1] == 2. );
This tells me that, although probably not documented anywhere the stride (incx
) has to be positive, (and also the size).
Fortunately, for many BLAS function the call can be converted into positive strides.
I needed a wrapper function to debug this any way, so wrote the folloing wrapper function:
void signed_dscal(int n, double alpha, double* x, int incx){
int prod = incx*n;
if(prod > 0) dscal(abs(n), alpha, x, abs(incx));
else dscal(abs(n), alpha, x + prod, abs(incx));
}
In this way, I can call signed_dscal
with positive or negative strides and sizes.
int main(){
{
double d[4] = {10., 2., 3.1, 4.1};
signed_dscal(4, 3., d, 1);
assert( d[1] == 6. );
}
{
double d[4] = {10., 2., 3.1, 4.1};
signed_dscal(4, 3., &d[4], -1);
assert( d[1] == 6. );
}
{
double d[4] = {10., 2., 3.1, 4.1};
signed_dscal(-4, 3., &d[4], 1);
assert( d[1] == 6. );
}
{
double d[4] = {10., 2., 3.1, 4.1};
signed_dscal(-4, 3., d, -1);
assert( d[1] == 6. );
}
return 0;
}
(Note that incx=0
still is a case impossible to amend.)
I still don't understand what is the logic behind this. Perhaps some implementation of BLAS will let you do this by default and others will try to protect against infinite loops which as a side effect will assume positive stride values.
Intel BLAS seems to imply that it supports negative strides because it mentions abs(incx)
, at least for AXPY: https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/blas-routines/blas-level-1-routines-and-functions/cblas-axpy.html
I didn't try and remains to be seen if it applies to other functions.
Upvotes: 2