cno
cno

Reputation: 675

Julia: why is GC slow with large number of mutable structs in memory

Why is garbage collection so much slower when a large number of mutable structs are loaded in memory as compared with non-mutable structs? The object tree should have the same size in both cases.

julia> struct Foo
           a::Float64
           b::Float64
           c::Float64
       end

julia> mutable struct Bar
           a::Float64
           b::Float64
           c::Float64
       end

julia> @time dat1 = [Foo(0.0, 0.0, 0.0) for i in 1:1e9];
  9.706709 seconds (371.88 k allocations: 22.371 GiB, 0.14% gc time)

julia> @time GC.gc(true)
  0.104186 seconds, 100.00% gc time

julia> @time GC.gc(true)
  0.124675 seconds, 100.00% gc time

julia> @time dat2 = [Bar(0.0, 0.0, 0.0) for i in 1:1e9];
 71.870870 seconds (1.00 G allocations: 37.256 GiB, 73.88% gc time)

julia> @time GC.gc(true)
 47.695473 seconds, 100.00% gc time

julia> @time GC.gc(true)
 41.809898 seconds, 100.00% gc time

Upvotes: 3

Views: 581

Answers (1)

Carpe Necopinum
Carpe Necopinum

Reputation: 76

Non-mutable structs may be stored directly inside an Array. This will never happen for mutable structs. In your case, the Foo objects are all stored directly in dat1, so there is effectively just a single (albeit very large) allocation reachable after creating the Arary.

In the case of dat2, every Bar object will have its own piece of memory allocated for it and the Array will contain references to these objects. So with dat2, you end up with 1G + 1 reachable allocations.

You can also see this using Base.sizeof:

julia> sizeof(dat1)
24000000000

julia> sizeof(dat2)
8000000000

You'll see that dat1 is 3 times as large, as every array entry contains the 3 Float64s directly, while the entries in dat2 take up just the space for a pointer each.

As a side note: For these kinds of tests, it is a good idea to use BenchmarkTools.@btime instead of the built-in @time. As it removes the compilation overhead from the result and also runs your code multiple times in order to give you a more representative result:

@btime dat1 = [Foo(0.0, 0.0, 0.0) for i in 1:1e6]
  2.237 ms (2 allocations: 22.89 MiB)

@btime dat2 = [Bar(0.0, 0.0, 0.0) for i in 1:1e6]
  6.878 ms (1000002 allocations: 38.15 MiB)

As seen above, this is particularly useful to debug allocations. For dat1 we get 2 allocations (one for the Array instance itself and one for the chunk of memory where the array stores its data), while with dat2 we have an additional allocation per element.

Upvotes: 5

Related Questions