Reputation: 15
I am in the process of migrating all my Matlab code into Julia. I have an old Matlab script that produces the standard bifurcation diagram for the logistic map, quite fast: ≈0.09s for the loop, ≈0.11s for the plot to come out (Matlab 2019a). The logistic map is well known and I will be brief here: x(t+1) = r*x(t)*(1-x(t))
, 0 ≤ r ≤ 4
. For accuracy, I choose maxiter = 1000
and r = LinRange(0.0,4.0, 6001)
.
I have tried to rewrite my Matlab code into Julia, but I am still a clumsy Julia programmer. The best I could come up with was to get 1.341015 seconds (44.79 M allocations: 820.892 MiB, 4.60% gc time)
for the loop to run, and Plots.jl takes 2.266 s (6503764 allocations: 283.60 MiB)
to save a pdf file of the plot (not bad), while it takes around 17s to get the plot to be seen in the Atom plots pane (that's OK with Plots). This was done with Julia 1.5.3 (both in Atom and VS Code).
I would be grateful if someone could provide some help with my Julia code below. It runs, but it looks a bit primitive and slow. I tried to change the style and looked for performance tips (@inbounds, @simd, @avx), but always got stuck in one problem or another. It simply does not make sense to have the same loop 15 times faster in Matlab than in Julia, and I know that. Actually, there is a piece of Matlab code that I particularly like (by Steve Brunton), which is extremely simple and elegant, and (apparently) easy to be re-written in Julia; but I stumbled here as well. Brunton's loops run in just ≈0.04s, and can be found below.
Help will be appreciated. Thanks. The plot is the usual one: enter image description here
My Matlab code:
tic
hold on % required for plotting all points
maxiter = 1000;
r1 = 0;
r4 = 4;
Tot = 6001;
r = linspace(r1, r4, Tot); % Number of r values (6001 points)
np = length(r);
y = zeros(maxiter+1, Tot); % Pre-allocation
y(1,1:np) = 0.5; % Generic initial condition
for n = 1 : maxiter
y(n+1,:) = r.*y(n,:) .* (1-y(n,:)); % Iterates all the r values at once
end
toc
tic
for n = maxiter-100 : maxiter+1 % eliminates transients
plot(r,y(n,:),'b.','Markersize',0.01)
grid on
yticks([0 0.2 0.4 0.6 0.8 1])
end
hold off
toc
My Julia code:
using Plots
using BenchmarkTools
#using LoopVectorization
#using SIMD
@time begin
rs = LinRange(0.0,4.0, 6001)
#rs = collect(rs)
x1 = 0.5
maxiter = 1000 # maximum iterations
x = zeros(length(rs), maxiter) # for each starting condition (across rows)
#for k = 1:length(rs)
for k in eachindex(rs)
x[k,1] = x1 # initial condition
for j = 1 : maxiter-1
x[k, j+1] = rs[k] * x[k, j] * (1 - x[k,j])
end
end
end
@btime begin
plot(rs, x[:,end-50:end], #avoiding transients
seriestype = :scatter,
markercolor=:blue,
markerstrokecolor=:match,
markersize = 1,
markerstrokewidth = 0,
legend = false,
markeralpha = 0.3)
#xticks! = 0:1:4
xlims!(0.01,4)
end
Steve Brunton's Matlab code:
tic
xvals=[];
for beta = linspace(0,4,6001)
beta;
xold = 0.5;
%transient
for i = 1:500
xnew = (beta*(xold-xold^2));
xold = xnew;
end
%xnew = xold;
xss = xnew;
for i = 1:1000;
xnew = ((xold-xold^2)*beta);
xold = xnew;
xvals(1,length(xvals)+1) = beta; % saving beta values
xvals(2,length(xvals)) = xnew; % saving xnew values
if (abs(xnew-xss) < .001)
break
end
end
end
toc
tic
plot (xvals(1,:),xvals(2,:),'b.', 'Linewidth', 0.1, 'Markersize', 1)
grid on
%xlim([2.5 4])
toc
Upvotes: 0
Views: 1215
Reputation: 15
In the meantime, I learned that the @avx
macro from LoopVectorization.jl
should not be applied to two loops that are not independent (which is the case with these two loops), and if I change the order of the loops, performance will be enhanced 5x. So, a better piece of code to produce a bifurcation diagram for the logistic map will be something like this:
using Plots
using BenchmarkTools
#using LoopVectorization
using SIMD
function bifur!(rs, x, xs, maxiter)
x[:,1] .= xs
for j in 1:maxiter-1 #the j loop needs to be sequential, so should be the outer loop
@inbounds @simd for k in 1:length(rs) # the k loop should be the innermost since it is first on x.
x[k, j+1] = rs[k] * x[k, j] * (1 - x[k,j])
end
end
return x
end
rs = LinRange(0.0, 4.0, 6001) #rs = collect(rs)
xs = 0.5
maxiter = 1000 # maximum iterations
x = zeros(length(rs), maxiter)
bifur!(rs, x, xs, maxiter)
@benchmark bifur!($rs, x, $xs, $maxiter) setup=(x = zeros(length($rs), $maxiter)) evals=1
@btime begin
plot(rs, x[:, end-50:end], #avoid transients
#plot($rs, @view($x[:, end-50:end]),
seriestype = :scatter,
markercolor=:blue,
markerstrokecolor=:match,
markersize = 1.1,
markerstrokewidth = 0,
legend = false,
markeralpha = 0.3)
#xticks! = 0:1:4
xlims!(2.75,4)
#savefig("logistic_bifur.pdf")
end
Upvotes: -1
Reputation: 2342
I do not know, why you were flagged, maybe this way of communication is not very suitable for StackOverflow, I can recommend to use https://discourse.julialang.org which is more suitable for long discussions.
Regarding your code, there couple of things that can be improved.
using Plots
using BenchmarkTools
using LoopVectorization
function bifur(rs, xs, maxiter)
x = zeros(length(rs), maxiter) # for each starting condition (across rows)
bifur!(x, rs, xs, maxiter)
end
function bifur!(x, rs, xs, maxiter)
# @avx - LoopVectorization is broken on julia nightly, so I had to switch to other options
@inbounds @simd for k = 1 : length(rs) # macro to vectorize the loop
x[k,1] = xs # initial condition
for j = 1 : maxiter-1
x[k, j+1] = rs[k] * x[k, j] * (1 - x[k,j])
end
end
return x
end
As you can see, I split bifur
in two functions: mutating, which was denoted with exclamation mark and non mutating, which is just named bifur
. This pattern is common in Julia and helps to properly benchmark and use code. If you want faster version which do not allocate, you use mutating version. If you want slower version with guaranteed result (i.e. it does not change between different runs) you use non mutating version.
Here you can see benchmark results
julia> @benchmark bifur($rs, $xs, $maxiter)
BenchmarkTools.Trial:
memory estimate: 45.78 MiB
allocs estimate: 2
--------------
minimum time: 38.556 ms (0.00% GC)
median time: 41.440 ms (0.00% GC)
mean time: 45.371 ms (10.08% GC)
maximum time: 170.765 ms (77.25% GC)
--------------
samples: 111
evals/sample: 1
This looks reasonable - best runtime is 38 ms and 2 allocations apparently coming from allocating x
. Note also that variables xs
and others were interpolated with $
symbol. It helps to benchmark properly, more can be read in BenchmarkTools.jl manual.
We can compare it with mutation version
julia> @benchmark bifur!(x, $rs, $xs, $maxiter) setup=(x = zeros(length($rs), $maxiter))
evals = 1
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 25.574 ms (0.00% GC)
median time: 27.556 ms (0.00% GC)
mean time: 27.717 ms (0.00% GC)
maximum time: 35.223 ms (0.00% GC)
--------------
samples: 98
evals/sample: 1
I had to add setup phase and now you can see that there are no allocation, as it can be expected and it runs slightly faster, 13 ms difference is x
initialization.
Here is a note: your benchmark results are wrong, since you do not reset x
between runs (no setup phase), so x
was properly initialized on the first run, but on all other runs no additional calculations were performed. So you get something like 100 runs with first run equals to 30 ms and all other runs equals to 4ms, so on average you get 4 ms.
Now, you can use this function straightforwardly
x = bifur(rs, xs, maxiter)
@btime begin
plot($rs, @view($x[:, end-50:end]), #avoid transients
seriestype = :scatter,
markercolor=:blue,
markerstrokecolor=:match,
markersize = 1.1,
markerstrokewidth = 0,
legend = false,
markeralpha = 0.3)
#xticks! = 0:1:4
xlims!(2.75,4)
#savefig("zee_bifurcation.pdf")
end
Note that here I use interpolation again, and also @view
for proper array slicing. Thing is, expressions like x[:, end-50:end]
create new copy of the array, which can slow down calculations sometimes. Of course it is of little importance in the case of the Plots.jl
, but can be useful in other calculations.
Upvotes: 1
Reputation: 15
I was frustrated by the Julia code's poor style and performance for the bifurcation diagram of the logistic map I presented above. I am a kind of a newbie to Julia and expected some help for a problem that is not that difficult for an experienced Julia programmer. Well, looking for help, I just got knocked on my head with flags, telling me that I was deviating from my question or something else. I was not.
I feel better now. The code below runs two loops (inside a function), filling in a 6001×1000 Array{Float64,2}
in just 3.276 ms (0 allocations: 0 bytes)
, and the plot comes out in 17.660 ms (83656 allocations: 11.32 MiB)
right on the Atom's plots pane. I think it can still be improved, but (for me) it's OK as it stands. Julia rocks.
using Plots
using BenchmarkTools
using LoopVectorization
function bifur(rs, x, xs, maxiter)
@avx for k = 1 : length(rs) # macro to vectorize the loop
x[k,1] = xs # initial condition
for j = 1 : maxiter-1
x[k, j+1] = rs[k] * x[k, j] * (1 - x[k,j])
end
end
#return x
end
rs = LinRange(0.0, 4.0, 6001) #rs = collect(rs)
xs = 0.5
maxiter = 1000 # maximum iterations
x = zeros(length(rs), maxiter) # for each starting condition (across rows)
@btime begin
bifur(rs, x, xs, maxiter)
end
@benchmark bifur(rs, x, xs, maxiter)
@btime begin
plot(rs, x[:, end-50:end], #avoid transients
seriestype = :scatter,
markercolor=:blue,
markerstrokecolor=:match,
markersize = 1.1,
markerstrokewidth = 0,
legend = false,
markeralpha = 0.3)
#xticks! = 0:1:4
xlims!(2.75,4)
#savefig("zee_bifurcation.pdf")
end
The plot should look like this:
Does anyone have an idea of how to adapt Brunton's above Matlab code to Julia? I kind of like his style. It looks beautiful for teaching.
Upvotes: 0
Reputation: 2707
@time begin
rs = LinRange(0.0,4.0, 6001)
x1 = 0.5
maxiter = 1000 # maximum iterations
x = zeros(length(rs), maxiter)
for k in eachindex(rs)
x[k,1] = x1 # initial condition
for j = 1 : maxiter-1
x[k, j+1] = rs[k] * x[k, j] * (1 - x[k,j])
end
end
end
shows:
1.490238 seconds (44.79 M allocations: 820.892 MiB, 5.81% gc time)
whereas
@time begin
let
rs = LinRange(0.0,4.0, 6001)
x1 = 0.5
maxiter = 1000 # maximum iterations
x = zeros(length(rs), maxiter)
for k in eachindex(rs)
x[k,1] = x1 # initial condition
for j = 1 : maxiter-1
x[k, j+1] = rs[k] * x[k, j] * (1 - x[k,j])
end
end
end
end
shows
0.044452 seconds (2 allocations: 45.784 MiB, 29.09% gc time)
let
introduces a new scope, so your problem is that you're running in global scope.
The compiler finds it difficult to optimise code in global scope, because any variables can be accessed from any location in your (possibly 1000s of lines of) source code. As described in the manual, this is the number 1 reason why code can run slower than it should.
Upvotes: 2