Reputation: 21737
In one of my application, I have to store elements of different subtypes in the array and I got big hit by the JIT performance. Below is a minimal example.
abstract A
immutable B <: A end
immutable C <: A end
b = B()
c = C()
@time getindex(A, b, b)
@time getindex(A, b, c)
@time getindex(A, c, c)
@time getindex(A, c, b)
@time getindex(A, b, c, b)
@time getindex(A, b, c, c);
0.007756 seconds (6.03 k allocations: 276.426 KB)
0.007878 seconds (5.01 k allocations: 223.087 KB)
0.005175 seconds (2.44 k allocations: 128.773 KB)
0.004276 seconds (2.42 k allocations: 127.546 KB)
0.004107 seconds (2.45 k allocations: 129.983 KB)
0.004090 seconds (2.45 k allocations: 129.983 KB)
As you see, each time I construct the array for different combination of elements, it has to do a JIT.
I also tried [...]
instead of T[...]
, it appeared worse.
Restart the kernel and run the following:
b = B()
c = C()
@time Base.vect(b, b)
@time Base.vect(b, c)
@time Base.vect(c, c)
@time Base.vect(c, b)
@time Base.vect(b, c, b)
@time Base.vect(b, c, c);
0.008252 seconds (6.87 k allocations: 312.395 KB)
0.149397 seconds (229.26 k allocations: 12.251 MB)
0.006778 seconds (6.86 k allocations: 312.270 KB)
0.113640 seconds (178.26 k allocations: 9.132 MB, 3.04% gc time)
0.050561 seconds (99.19 k allocations: 5.194 MB)
0.031053 seconds (72.50 k allocations: 3.661 MB)
In my application I face a lot of different subtypes: each element is of type NTuple{N, A}
where N
can change. So in the end the application was stuck in JIT.
What's the best way to get around it? The only way I can think of is to create a wrapper, say W
, and box all my element into W
before entering the array. So the compiler only compiles the array function once.
immutable W
value::NTuple
end
Thanks to @Matt B. after overloading his getindex
,
c = C()
@time getindex(A, b, b)
@time getindex(A, b, c)
@time getindex(A, c, c)
@time getindex(A, c, b)
@time getindex(A, b, c, b)
@time getindex(A, b, c, c);
0.008493 seconds (6.43 k allocations: 289.646 KB)
0.000867 seconds (463 allocations: 19.012 KB)
0.000005 seconds (5 allocations: 240 bytes)
0.000003 seconds (5 allocations: 240 bytes)
0.004035 seconds (2.37 k allocations: 122.535 KB)
0.000003 seconds (5 allocations: 256 bytes)
Also, I realized the JIT of tuple is actually quite efficient.
@time tuple(1,2)
@time tuple(b, b)
@time tuple(b, c)
@time tuple(c, c)
@time tuple(c, b)
@time tuple(b, c, b)
@time tuple(b, c, c);
@time tuple(b, b)
@time tuple(b, c)
@time tuple(c, c)
@time tuple(c, b)
@time tuple(b, c, b)
@time tuple(b, c, c);
0.000004 seconds (149 allocations: 10.183 KB)
0.000011 seconds (7 allocations: 336 bytes)
0.000008 seconds (7 allocations: 336 bytes)
0.000007 seconds (7 allocations: 336 bytes)
0.000007 seconds (7 allocations: 336 bytes)
0.000005 seconds (7 allocations: 352 bytes)
0.000004 seconds (7 allocations: 352 bytes)
0.000003 seconds (5 allocations: 192 bytes)
0.000004 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
Upvotes: 0
Views: 64
Reputation: 31352
The JIT heuristics here could probably be better tuned in the base library. While Julia does default to generating specialized methods for unique permutations of argument types, there are a few escape hatches you can use to reduce the number of specializations:
Use f(T::Type)
instead of f{T}(::Type{T})
. Both are well-typed and behave nicely through inference, but the former will only generate one method for all types.
Use the undocumented all-caps g(::ANY)
flag instead of g(::Any)
. It's semantically identical, but ANY
will prevent specialization for that argument.
In this case, you probably want to specialize on the type but not the values:
function Base.getindex{T<:A}(::Type{T}, vals::ANY...)
a = Array(T,length(vals))
@inbounds for i = 1:length(vals)
a[i] = vals[i]
end
return a
end
Upvotes: 3