groksrc
groksrc

Reputation: 3025

Ruby Threading Issue or No Threading Issue?

Preface: So we have a bit of a discussion going on regarding the example code below. The debate is whether or not a threading issue exists in the code below. What we are looking for is a good answer to either why it exists or why it does not.

The example below shows the following. A class is constructed named IoBoundApiCall that represents a network call. This class should be ignored unless it is relevant to the the discussion for some reason and if so help making it irrelevant is appreciated. In our production code this is a query to the Google API. Following that there is a loop that sets up an array of a thousand items, each item in the array is a hash. This sets up the 'shared data'.

Next we have the code in question, a loop batched in groups of 100. Each batch of spawns 100 threads, makes the pseudo-api call, and stores the result back into the hash. The results of the loop are output to yaml for examination. Note that a mutex is not used.

Program Output: The correct program output looks like the following.

---
- :id: '1'
  :data: string1
  :results:
  - '0': id local_string1 slept for 1
  - '1': id local_string1 slept for 1_copy
- :id: '2'
  :data: string2
  :results:
  - '0': id local_string2 slept for 0
  - '1': id local_string2 slept for 0_copy
.
.
.

Threading Issue Output: Unexepcted output would look something like the following. Note that the results for string1 are incorrectly paired with string2

---
- :id: '1'
  :data: string1
  :results:
  - '0': id local_string2 slept for 0
  - '1': id local_string2 slept for 0_copy
- :id: '2'
  :data: string2
  :results:
  - '0': id local_string1 slept for 1
  - '1': id local_string1 slept for 1_copy
.
.
.

The question: In the following code is it possible for a race condition to exist where the result gets stored with the wrong hash? Why or why not.

#!/usr/bin/env ruby
require 'bundler'
require 'yaml'

Bundler.require

# What this code is doesn't really matter. It's a network bound API service call.
# It's only here to make the example below work. Please ignore this class
class IoBoundApiCall
  def query(input)
    randomly = rand(0.0..1.0)
    sleep randomly
    ["id #{input} slept for #{randomly}", "id #{input} slept for #{randomly}_copy"]
  end
end

api = IoBoundApiCall.new

inputs = []

(1..1000).each do |i|
  inputs << {
    id: "#{i}",
    data: "string#{i}",
    results: []
  }
end

# This is the code in question
inputs.each_slice(100) do |batch|
  threads = []
  batch.each do |input|
    threads << Thread.new do
      data_from_hash = input[:data]
      thread_local_string = "local_#{data_from_hash}"

      questionable_results = api.query(thread_local_string)
      questionable_results.each_with_index do |questionable_result, i|
        result = {}
        result["#{i}"] = questionable_result

        # DANGER WILL ROBINSON!! THREADING ISSUE??
        input[:results] << result
      end
    end
  end
  threads.map(&:join)
end

puts inputs.to_yaml

Upvotes: 0

Views: 38

Answers (1)

Max
Max

Reputation: 22325

With the official Ruby VM (YARV), there is no threading issue. YARV is completely thread-unsafe so essentially every time you touch a Ruby object the global VM lock (GVL) blocks all threads but one to prevent objects from being put into an invalid state due to multiple threads stepping on each other.

The only way this code could cause a problem is if updating an input object causes some side-effect in the VM's internal state, which conflicts with another thread that is concurrently updating a different input. But that is precisely what the GVL prevents.

Upvotes: 1

Related Questions