Are the variables scoped in threads

Question

I'm trying to scrape some information off a site and I've never used threads before. I bashed together this test to mimic what I'm trying to do:

require 'thread'
mutex = Mutex.new
mut = Mutex.new
hash = {}
n = 0
a = []
b = []
# x = 0
10.times do |i|
 a << Thread.new(i) do |top_index|
   mutex.synchronize do
     hash[top_index] = []
     sleep 0.2
     100.times do |sub_index|
       b << Thread.new(top_index, sub_index, hash) do |t, s, my_hash|
         mut.synchronize do
           r = s
           sleep 0.2
           my_hash[t].push(s)
         end
       end
     end
     b.each {|y| y.join }
     puts "sub: #{top_index} - #{hash[top_index].length}"
     puts hash[top_index]
   end
 end
end
a.each {|q| q.join }
hash.each { |key, value| n += value.length }
puts "Final Tally - #{n}"

With sleep standing in for some RestClient get requests, and the numbers representing the ordering and pushing of some info I've scraped from the site. But when looking at the order that everything is being entered I'm noticing patterns across arrays, so I'm wondering if when r is assigned in one thread it affects its value in the other thread. But that doesn't make sense since that would severely limit its usefulness for concurrent requests.

Also, I figured since everything is concurrent (or acts like it's concurrent) it should return in a few seconds with the sleep timers, but it actually takes quite a little while.

I just tested it, and it actually took longer than doing it without threads?

Threaded Total Time: 204.04028

Normal Total: 203.133638

So, now I'm very confused.

Dan Tao · Accepted Answer

I don't know what "patterns" you're noticing; but generally speaking, the way you are using the Thread initializer in your example should work as you expect.

I just tested it, and it actually took longer than doing it without threads?

This is because you're synchronizing literally all of the work you're doing with these threads. So there is zero concurrency. So it makes sense that the single-threaded solution outperforms the "multi-threaded" solution, because the latter is just doing all of the same work (in the same order) as the former with the additional overhead of spawning threads (and making them wait).

You don't need to synchronize these operations. The Ruby interpreter has a global interpreter lock, which prevents the majority of race conditions developers experience in lower-level languages. The main scenario where you'd want to use a Mutex is when there might be something happening outside Ruby land (e.g., some lower-level system operation) that needs to be synchronized.

Here's a stripped down version of your example (without synchronization) that works just fine:

require 'thread'

hash = {}
outer_threads = []
inner_threads = []

10.times do |i|
 outer_threads << Thread.new(i) do |top_index|
   hash[top_index] = []
   sleep 0.2
   20.times do |sub_index|
     inner_threads << Thread.new(top_index, sub_index, hash[top_index]) do |t, s, arr|
       sleep 0.2
       arr.push(s + 1)
     end
   end
   inner_threads.each(&:join)
 end
end

outer_threads.each(&:join)

# Verify that the hash is populated with arrays comprising the numbers 1 to 20,
# as we would expect.
hash.each do |key, value|
  puts "#{key}: #{value.sort.join(', ')}"
end

Are the variables scoped in threads

Answers (1)

Related Questions