Reputation: 10403
My question is specific to MRI. It looks like since Ruby 1.9 all threads are native but MRI continues to run them one at a time. This sounds like parallel execution of threads is not possible in MRI but does using Threads improve concurrency in your program for something that looks like this?
In other words do you get any benefit from using threads to upload a lot of files onto S3?
# https://gist.github.com/milesmatthias/25c15fd8384d4a7e76f2
...
file_number = 0
mutex = Mutex.new
threads = []
thread_count.times do |i|
threads[i] = Thread.new {
until files.empty?
mutex.synchronize do
file_number += 1
Thread.current["file_number"] = file_number
end
file = files.pop rescue nil
next unless file
data = File.open(file)
if File.directory?(data)
data.close
next
else
obj = s3_bucket.objects[path]
obj.write(data, { acl: :public_read })
data.close
end
end
}
end
threads.each { |t| t.join }
...
Upvotes: 2
Views: 324
Reputation: 35483
Ruby MRI (a.k.a. YARV) threads can improve some kinds of operations, especially I/O.
The VM runs one thread at a time, even on multi-core processors, because of a global lock.
The VM has special optimizations for some operations including I/O. When a thread is waiting for an I/O operation, then Ruby transfers control to the next thread. It is also possible for threads to call non-Ruby code such as native C extensions, and these may be able to run in parallel.
When I built an I/O uploader app on AWS, our benchmarks showed that in practice we could get total throughput benefits up to about 100 threads. We had a very fast network connection, and we found that the major benefit of the threading was coming because of the comparatively long time to open a new connection.
Your mileage may vary, so benchmark.
Upvotes: 3
Reputation: 369594
YARV has a Giant VM Lock (GVL) that prevents two threads from entering the interpreter loop at the same time. That is true.
However, this only means that you cannot have two parallel threads running Ruby code (or more precisely, YARV bytecode) at the same time. You can have parallel threads running C code at the same time (and the entire core library, as well as big parts of the standard library, and some Gems are actually written in C, not Ruby), you can have parallel threads doing or waiting for I/O, you can have C extensions do what they like in parallel threads (except running Ruby code).
So, yes, threads can improve performance even on YARV.
Upvotes: 4
Reputation: 107107
Threads in Ruby improve concurrency, but do not increase parallelism. What that means is that threads allow Ruby to deal with multiple things at the same time (concurrency), but Ruby still cannot do multiple things at the same time (parallelism).
Why is this difference important? If your code needs a lot of CPU time, it will not improve from threads, because all threads have to use the same CPU. But if your code does a lot of IO (what usually means that the CPU is idling a lot), than your overall performance might improve a lot.
In your example (uploading files to S3) I would expect an increase in performance depending on your network bandwidth.
Upvotes: 3