Reputation: 11
I want to have a function that I can call from a struct. For this, I'm trying to mimic in Julia (to an extent) , C++ class methods. To achieve this, I add a function attribute in a Julia struct and assign the attribute to a function object I pass in at the constructor stage.
The problem is, it works, but the approach is literally 1000 times slower than just directly calling a function.
Below is a MWE of my code:
struct struct_wh_method{F}
func::F;
function struct_wh_method(func_in)
new{typeof(func_in)}(func_in)
end
end
fun() = 1+1;
Now, instantiating the struct object:
A = struct_wh_method(fun);
Next, importing BenchmarkTools
import BenchmarkTools
I finally compare the performance between A.func() and fun():
@btime A.func()
35.583 ns (0 allocations: 0 bytes)
@btime fun()
0.035 ns (0 allocations: 0 bytes)
Is there a way to have the function call more efficient? I have a feeling that I'm doing something terribly wrong. Perhaps, this is fundamentally the incorrect way of using Julia, in which case I would greatly appreciate anyone guiding me to the elegant and high performance "Julian" way of achieving a similar goal. I greatly appreciate the help of the stack overflow community.
Cheers.
Upvotes: 1
Views: 583
Reputation: 20248
I'd say there are two relatively separate concerns in your question. The first one is how to reliably perform such microbenchmarks. The second one is how to achieve what you want: store a function in a struct without degrading performances.
Consider the following examples, which I think may help understaing what goes on here.
If the benchmarked function is too simple, the compiler will be able to actually optimize the code away and simply replace it with a pre-computed result. This usually yields sub-nanosecond benchmarks, which is a good sign that something went wrong: with CPU frequencies being in the GHz order these days, any computation should that takes much less than a nanosecond is suspiciously fast.
julia> too_simple(x) = x + 1
too_simple (generic function with 1 method)
julia> @btime too_simple(2)
0.026 ns (0 allocations: 0 bytes)
3
So let's first take a complex enough function for the compiler to not be able to optimize its code away. And let's call it with small enough data that we stay in the nanosecond range. My personal favorite is the sum of all elements in a vector (preferably with floating-point numbers so that the compiler can't make as many optimizations as with integer types). Note that global variables passed to benchmarked functions should be interpolated in @btime
. Summing a few elements takes a few nanoseconds, so this looks like a good base for our benchmark: we actually measure something significant, but small enough that any perturbation should be visible:
julia> function fun(x)
acc = 0.0
for elt in x
acc += elt
end
acc
end
fun (generic function with 1 method)
julia> x = rand(8);
julia> using BenchmarkTools
# x is a global variable => interpolate it with $x
julia> @btime fun($x)
5.454 ns (0 allocations: 0 bytes)
3.125754440231318
Now, let's naively try to embed the function into a struct:
julia> struct Bar
func::Function
end
julia> b = Bar(fun)
Bar(fun)
# Both `b` and `x` need are global variables => escape them
julia> @btime $b.func($x)
22.289 ns (1 allocation: 16 bytes)
3.125754440231318
Not only have we lost some time, but there also was a memory allocation. Of course, if the payload in fun
had been larger, we wouldn't have seen anything. But still this is not as good as the cost-less abstraction that one might hope.
The problem here is due to the fact that the func
field in Bar
is not concretely typed: in Julia, each function is of its own specific type (although the types of all functions are subtypes of of the Function
abstract type). The compiler doesn't know much about it and can't make too many optimizations beforehand: it has to wait until you actually extract the func
field from object b
, in order to check exactly what function this is.
What you proposed in your question actually solves this by embedding the concrete type of the function as a type parameter. Note how the type of f
in the example below embeds fun
itself; this allows the compiler to know about fun
as soon as the type of f
is known (i.e. during Just-Ahead-of-Time compilation).
julia> struct Foo{F}
func::F
end
julia> f = Foo(fun)
Foo{typeof(fun)}(fun)
julia> typeof(f)
Foo{typeof(fun)}
julia> @btime $f.func($x)
5.055 ns (0 allocations: 0 bytes)
3.125754440231318
Now we get the same performance as before.
In conclusion, I'd say that if you can use such a parameterized type (i.e. if you can afford two instances of your structure to have two separate types if they store different functions) then such an approach should be fine. Still, all this does not seem very Julian; you might want to consider other approaches. Maybe ask another question explaining the problem you were trying to solve with such an approach?
Upvotes: 1
Reputation: 3005
The difference is in the time to look up your struct. If you interpolate the variable in the @btime
call (note the $
below), you get the same time:
julia> @btime $A.func()
0.036 ns (0 allocations: 0 bytes)
2
julia> @btime fun()
0.036 ns (0 allocations: 0 bytes)
2
Upvotes: 1
Reputation: 1460
What is taking long in your example is not the call to the function itself, but accessing the element of the struct. I.e. a struct with an Int64 as element takes just as long to get it as to get the function. As soon as you put some code in the function that actually does something, there won't be a recognizable difference anymore.
Here some examples:
using BenchmarkTools
struct MyStruct
F::Function
end
struct MyStructInt
I::Int64
end
easy_f() = 1
function hard_f()
count = 0.
for i in rand(100000)
count+=i
end
end
mseasy = MyStruct(easy_f)
mshard = MyStruct(hard_f)
msint = MyStructInt(1)
I = 1
@btime mseasy.F()
#29.826 ns (1 allocation: 16 bytes)
@btime easy_f()
#0.026 ns (0 allocations: 0 bytes)
@btime mshard.F()
#70.977 μs (3 allocations: 781.34 KiB)
@btime hard_f()
#69.223 μs (2 allocations: 781.33 KiB)
@btime msint.I
#29.282 ns (1 allocation: 16 bytes)
@btime I
#1.539 ns (0 allocations: 0 bytes)
Remarkable is the fact that getting the value of an integer takes longer than the value of the easy_f
function. I guess the reason is maybe that the compiler is doing a great job at optimizing the function.(Maybe the value is even stored in the CPU cache?)
However, you can still get a slight improvement if instead of calling the object of the struct you define a function that does that (which is usually Julia style)
For example like this:
callfunc(ms::MyStruct) = ms.F()
@btime callfunc(mseasy)
#8.606 ns (0 allocations: 0 bytes)
Upvotes: 1