Reputation: 303
I've been using Julia for multi-thread processing of large amounts of data and observed one interseting pattern. The memory usage (reported by htop
) slowly grows until the process is killed by OS. The project is complex and it is hard to produce a suitable MWE, but I carried out a simple experiment:
using Base.Threads
f(n) = Threads.@threads for i=1:n
x = zeros(n)
end
Now, I called f(n)
repeatedly for various values of n (somewhere between 10^4 and 10^5 on my 64 Gb machine). The result is that sometimes everything works as expected and memory gets freed after return, however sometimes this is not the case and an amount of used memory reported by htop
hangs at a large value even though it seems no computations are made:
Explicit garbage collecting GC.gc()
helps only a little, some memory is freed, but only a small chunk. Also, calling GC.gc()
sometimes in the loop in function f
helps, but the problem persists and, of course, performance is decreased. After exiting Julia, the allocated memory gets back to normal (probably freed by OS).
I've read about how julia manages its memory and how the memory is released only when memory tally is bigger than some value. But in my case, it results into process being killed by OS. It seems to me that GC somehow loses track of all allocated memory
Could anybody please explain this behaviour and how to prevent it without slowing down the code by repetitive calling of GC.gc()
? And why is garbage collection broken in this way?
More details:
Here is my versioninfo
output:
julia> versioninfo()
Julia Version 0.7.0
Commit a4cb80f3ed (2018-08-08 06:46 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 36
Upvotes: 12
Views: 1344
Reputation: 65
I have been experiencing similar issues when using Distributed.jl. Likewise, I have tried GC.gc()
on each worker process, which helps alleviate memory consumption somewhat but for large tasks it eventually cripples the machine/process and the only fix is to restart Julia.
I could reproduce your MWE on Julia 1.7.1:
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD EPYC-Rome Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, znver2)
Environment:
JULIA_NUM_THREADS = 32
I also have a MWE using CSV.jl, which will use all available CPUS to read in a large file:
##start julia such that multiple processes are available i.e. Threads.nthreads()>1
Threads.nthreads()
using CSV,DataFrames
## memory usage in GB after starting julia and loading packages:
used_mem() = (println("$(round((Sys.total_memory()-Sys.free_memory())/2^30 -9))G used"))
used_mem()
## create large file for CSV.jl to read (you can adjust n as appropriate for your machine, this maxes out at about 7.5GB on my machine)
n = 100000000
CSV.write("test.csv",DataFrame(repeat([(1,1,1)],n)))
used_mem()
##during the above process, memory peaks, but running garbage collection returns it to original state
GC.gc()
used_mem()
##now we load in test.csv, using all available CPUS
CSV.read("test.csv",DataFrame)
## there is now a very large dataframe in ans, so memory usage is high again
used_mem()
##clear ans and collect garbage
1+1
GC.gc()
### if Threads.nthreads()>1, memory usage is still very high, even with nothing running and no big variables to explain it
## If Threads.nthreads()=1 (i.e. start julia with `JULIA_NUM_THREADS=1`) then used_mem() is as on initialisation
varinfo()
used_mem()
I did not come up with a MWE for Distributed.jl usecases, but that is where I mainly run into problems in my application. Interestingly, some memory is freed when I kill all worker processes, but some memory holding continues after that as well.
Upvotes: 0
Reputation: 4370
Since this was asked quite a while ago, hopefully this no longer occurs -- though I cannot test to be sure without an MWE.
One point which may be worth noting, however, is that the Julia garbage collector is single-threaded; i.e., there is only ever one garbage collector no matter how many threads you have generating garbage.
Consequently, if you are going to be generating a lot of garbage in a parallel workflow, it is generally good advice to use multiprocessing (i.e. MPI.jl or Distributed.jl) rather than multithreading. In multiprocessing, in contrast to multithreading, each process gets its own GC.
Upvotes: 1