Šimon Mandlík
Šimon Mandlík

Reputation: 303

Memory doesn't get freed in multiple threads

I've been using Julia for multi-thread processing of large amounts of data and observed one interseting pattern. The memory usage (reported by htop) slowly grows until the process is killed by OS. The project is complex and it is hard to produce a suitable MWE, but I carried out a simple experiment:

using Base.Threads
f(n) = Threads.@threads for i=1:n
    x = zeros(n)
end

Now, I called f(n) repeatedly for various values of n (somewhere between 10^4 and 10^5 on my 64 Gb machine). The result is that sometimes everything works as expected and memory gets freed after return, however sometimes this is not the case and an amount of used memory reported by htop hangs at a large value even though it seems no computations are made:

enter image description here

Explicit garbage collecting GC.gc() helps only a little, some memory is freed, but only a small chunk. Also, calling GC.gc() sometimes in the loop in function f helps, but the problem persists and, of course, performance is decreased. After exiting Julia, the allocated memory gets back to normal (probably freed by OS).

I've read about how julia manages its memory and how the memory is released only when memory tally is bigger than some value. But in my case, it results into process being killed by OS. It seems to me that GC somehow loses track of all allocated memory

Could anybody please explain this behaviour and how to prevent it without slowing down the code by repetitive calling of GC.gc()? And why is garbage collection broken in this way?

More details:

Upvotes: 12

Views: 1344

Answers (2)

Leo T. Osborne Jr
Leo T. Osborne Jr

Reputation: 65

I have been experiencing similar issues when using Distributed.jl. Likewise, I have tried GC.gc() on each worker process, which helps alleviate memory consumption somewhat but for large tasks it eventually cripples the machine/process and the only fix is to restart Julia.

I could reproduce your MWE on Julia 1.7.1:

Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD EPYC-Rome Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, znver2)
Environment:
  JULIA_NUM_THREADS = 32

I also have a MWE using CSV.jl, which will use all available CPUS to read in a large file:

##start julia such that multiple processes are available i.e. Threads.nthreads()>1
Threads.nthreads()
using CSV,DataFrames
## memory usage in GB after starting julia and loading packages:
used_mem() = (println("$(round((Sys.total_memory()-Sys.free_memory())/2^30 -9))G used"))
used_mem()

## create large file for CSV.jl to read (you can adjust n as appropriate for your machine, this maxes out at about 7.5GB on my machine) 
n = 100000000
CSV.write("test.csv",DataFrame(repeat([(1,1,1)],n)))
used_mem()
##during the above process, memory peaks, but running garbage collection returns it to original state
GC.gc()
used_mem()

##now we load in test.csv, using all available CPUS
CSV.read("test.csv",DataFrame)
## there is now a very large dataframe in ans, so memory usage is high again
used_mem()
##clear ans and collect garbage
1+1
GC.gc()

### if Threads.nthreads()>1, memory usage is still very high, even with nothing running and no big variables to explain it
## If Threads.nthreads()=1 (i.e. start julia with `JULIA_NUM_THREADS=1`) then used_mem() is as on initialisation
varinfo()
used_mem()

I did not come up with a MWE for Distributed.jl usecases, but that is where I mainly run into problems in my application. Interestingly, some memory is freed when I kill all worker processes, but some memory holding continues after that as well.

Upvotes: 0

cbk
cbk

Reputation: 4370

Since this was asked quite a while ago, hopefully this no longer occurs -- though I cannot test to be sure without an MWE.

One point which may be worth noting, however, is that the Julia garbage collector is single-threaded; i.e., there is only ever one garbage collector no matter how many threads you have generating garbage.

Consequently, if you are going to be generating a lot of garbage in a parallel workflow, it is generally good advice to use multiprocessing (i.e. MPI.jl or Distributed.jl) rather than multithreading. In multiprocessing, in contrast to multithreading, each process gets its own GC.

Upvotes: 1

Related Questions