Reputation: 9608
During an evaluation of an other question I stumple over a case where two different julia programms generate the same code but need different time to execute.
using BenchmarkTools
test(n) = [g() for i = 1:n]
Case 1:
g() = 0;
@btime test(1000);
1.020 μs (1 allocation: 7.94 KiB)
Code 1:
code_native(g,())
.text
Filename: In[2]
pushq %rbp
movq %rsp, %rbp
Source line: 1
xorl %eax, %eax
popq %rbp
retq
nopl (%rax,%rax)
@code_native test(1000)
.text
Filename: In[1]
pushq %rbp
movq %rsp, %rbp
Source line: 2
subq $16, %rsp
xorl %eax, %eax
testq %rdi, %rdi
cmovnsq %rdi, %rax
movq $1, -16(%rbp)
movq %rax, -8(%rbp)
movabsq $collect, %rax
leaq -16(%rbp), %rdi
callq *%rax
addq $16, %rsp
popq %rbp
retq
nopw %cs:(%rax,%rax)
Case 2:
g() = UInt8(0);
@btime test(1000);
142.603 ns (1 allocation: 1.06 KiB)
Code 2:
code_native(g,())
.text
Filename: In[8]
pushq %rbp
movq %rsp, %rbp
Source line: 1
xorl %eax, %eax
popq %rbp
retq
nopl (%rax,%rax)
@code_native test(1000)
.text
Filename: In[11]
pushq %rbp
movq %rsp, %rbp
Source line: 2
subq $16, %rsp
xorl %eax, %eax
testq %rdi, %rdi
cmovnsq %rdi, %rax
movq $1, -16(%rbp)
movq %rax, -8(%rbp)
movabsq $collect, %rax
leaq -16(%rbp), %rdi
callq *%rax
addq $16, %rsp
popq %rbp
retq
nopw %cs:(%rax,%rax)
Different timings but same code sounds very weird to me. Could someone explain what happens here?
Upvotes: 0
Views: 123
Reputation: 18217
The time difference is unrelated to the different function g()
used each time, but to the amount of memory zeroed as a result.
In case 1, 8 bytes * 1000 = 8000 bytes need to be allocated and zeroed.
In case 2, 1 bytes * 1000 = 1000 bytes need to be allocated and zeroed.
This can be seen from the results of @btime
. In a clearer example, we have:
julia> @btime zeros(1000);
767.300 ns (1 allocation: 7.94 KiB)
julia> @btime zeros(125);
128.849 ns (1 allocation: 1.06 KiB)
Where zeros(n)
simply returns an array of n
Int zeroes. Notice the amount allocated matches the amounts in the question.
UPDATE
Stefan pointed out that, curiously, the output of @code_native
for both g()
and test(Int)
is the same in both runs. Which begs the question how the computer knows if it is allocating UInt8s or Ints?
Since g()
is redefined and test(Int)
depends on it, the 0.5/0.6 introduced world age mechanism for dealing with redefinitions, triggers a recompilation of test(Int)
when invoked after the redefinition. The new test(Int)
has a similar @code_native
(on an x86 target machine), but a reference to a $collect
value is different in the two compilations. To clear this up, the @code_llvm
output shows a difference in a suffix between the versions:
define %jl_value_t addrspace(10)* @julia_test_62122(i64) #0 !dbg !5 {
top:
:
:
%5 = call %jl_value_t addrspace(10)* @julia_collect_62123(%Generator addrspace(11)* nocapture readonly %4)
ret %jl_value_t addrspace(10)* %5
}
vs.
define %jl_value_t addrspace(10)* @julia_test_62151(i64) #0 !dbg !5 {
top:
:
:
%5 = call %jl_value_t addrspace(10)* @julia_collect_62152(%Generator addrspace(11)* nocapture readonly %4)
ret %jl_value_t addrspace(10)* %5
}
A closer to the metal approach would dig out the machine code for the two versions:
0x55, 0x48, 0x89, 0xe5, 0x48, 0x8b, 0x06, 0x48, 0x8b, 0x38, 0x48, 0xb8, 0xa0, 0x52, 0xa1, 0x21, 0x7e, 0x7e, 0x00, 0x00, 0xff, 0xd0, 0x5d, 0xc3
vs.
0x55, 0x48, 0x89, 0xe5, 0x48, 0x8b, 0x06, 0x48, 0x8b, 0x38, 0x48, 0xb8, 0x10, 0x58, 0xa1, 0x21, 0x7e, 0x7e, 0x00, 0x00, 0xff, 0xd0, 0x5d, 0xc3
Note 0xc3 is the x86 opcode for ret
instruction. To get at the machine code, you need to go through the rabbit hole of the methods(test)
nested objects/arrays.
Upvotes: 3