Reputation: 714
I would like to calculate a quadratic form: x' Q y
in Julia.
What would be the most efficient way to calculate this for the cases:
Q
is symmetric.x
and y
are the same (x = y
).Q
is symmetric and x = y
.I know Julia has dot()
. But I wonder if it is faster than BLAS call.
Upvotes: 23
Views: 1708
Reputation: 12654
Julia's LinearAlgebra stdlib has native implementations of 3-argument dot
, and also a version that is specialized for symmetric/hermitian matrices. You can view the source here and here.
You can confirm that they do not allocate using BenchmarkTools.@btime
or BenchmarkTools.@ballocated
(remember to interpolate variables using $
). The symmetry of the matrix is exploited, but looking at the source, I don't see how x == y
could enable any serious speedup, except perhaps saving a few array lookups.
Edit: To compare the execution speed of the BLAS version and the native one, you can do
1.7.0> using BenchmarkTools, LinearAlgebra
1.7.0> X = rand(100,100); y=rand(100);
1.7.0> @btime $y' * $X * $y
42.800 μs (1 allocation: 896 bytes)
1213.5489200642382
1.7.0> @btime dot($y, $X, $y)
1.540 μs (0 allocations: 0 bytes)
1213.548920064238
This is a big win for the native version. For bigger matrices, the picture changes, though:
1.7.0> X = rand(10000,10000); y=rand(10000);
1.7.0> @btime $y' * $X * $y
33.790 ms (2 allocations: 78.17 KiB)
1.2507105095988091e7
1.7.0> @btime dot($y, $X, $y)
44.849 ms (0 allocations: 0 bytes)
1.2507105095988117e7
Possibly because BLAS uses threads, while dot
is not multithreaded. There are also some floating point differences.
Upvotes: 13
Reputation: 20950
You can write an optimized loop using Tullio.jl, doing this all in one sweep. But I think it's not going to beat BLAS significantly:
Chained multiplication is also very slow, because it doesn't know there's a better algorithm.
julia> # a, b, x are the same as in Bogumił's answer
julia> @btime dot($a, $x, $b);
82.305 ms (0 allocations: 0 bytes)
julia> f(a, x, b) = @tullio r := a[i] * x[i,j] * b[j]
f (generic function with 1 method)
julia> @btime f($a, $x, $b);
80.430 ms (1 allocation: 16 bytes)
Adding LoopVectorization.jl may be worth it:
julia> using LoopVectorization
julia> f3(a, x, b) = @tullio r := a[i] * x[i,j] * b[j]
f3 (generic function with 1 method)
julia> @btime f3($a, $x, $b);
73.239 ms (1 allocation: 16 bytes)
But I don't know about how to go about the symmetric case.
julia> @btime dot($a, $(Symmetric(x)), $b);
42.896 ms (0 allocations: 0 bytes)
although there might be linear algebra tricks to cut that down intelligently with Tullio.jl.
In this kind of problem, benchmarking trade-offs is everything.
Upvotes: 7
Reputation: 4370
The existing answers are both good. However, a few additional points:
While Julia will indeed by default simply call BLAS here, as Oscar notes, some BLASes are faster than others. In particular, MKL will often be somewhat faster than OpenBLAS on modern x86 hardware. Fortunately, in Julia it is unusually easy to manually choose your BLAS backend. Simply typing using MKL
at the REPL on Julia 1.7 or later will switch you to the MKL backend via the MKL.jl package.
While it is not used by default, there do now exist a couple pure Julia linear algebra packages that can match or even beat traditional Fortran-based BLAS. In particular, Octavian.jl:
Upvotes: 18
Reputation: 69879
If your matrix is symmetric use the Symmetric
wrapper to improve performance (a different method is called then):
julia> a = rand(10000); b = rand(10000);
julia> x = rand(10000, 10000); x = (x + x') / 2;
julia> y = Symmetric(x);
julia> @btime dot($a, $x, $b);
47.000 ms (0 allocations: 0 bytes)
julia> @btime dot($a, $y, $b);
27.392 ms (0 allocations: 0 bytes)
If x
is the same as y
see https://discourse.julialang.org/t/most-efficient-way-to-compute-a-quadratic-matrix-form/66606 for a discussion of options (but in general it seems dot
is still fast then).
Upvotes: 10