loosecannon
loosecannon

Reputation: 7803

is there a limit to the number of threads that ruby can run at once?

If not whats the maximum while still remaining efficient?

I'm creating 14 threads, each of which opens a list of URLs(about 500) creates a new thread for each one, which then downloads it, and adds it to a MySQL db. The MySQL pool size is set to 50.

This is a rake task in RoR.

Would this work better using Kernal#fork or some other method?

Upvotes: 16

Views: 14866

Answers (3)

daremkd
daremkd

Reputation: 8424

require 'open-uri'
a = 'http://www.example.com ' * 30
arr = a.split(' ')

arr.each_slice(3) do |group|
  group.map do |site|
    Thread.new do
      open(site)
      p 'finished'
    end
  end.each(&:join)
end

Upvotes: 18

AboutRuby
AboutRuby

Reputation: 8116

Well, since your threads are going to be IO bound, the good news is that both Ruby 1.8 and 1.9 threads will work for this. Ruby 1.8 uses "userspace threads," meaning no real new OS threads are created when you create new threads in Ruby. This is bad for CPU multitasking, since only one Ruby thread is actually running at a time, but good for IO multitasking. Ruby 1.9 uses real threads, and will be good for either.

The number of threads you can create really depends on your system. There are of course practical limits, but you probably don't want to get anywhere near them. First, unless the servers you're downloaidng from are very slow and your connection is very fast, just a few threads is going to saturate your Internet connection. Also, if you're grabbing a lot of pages from a single server, throwing 500 requests at it at once from 500 threads isn't going to do any good either.

I'd start pretty small: 10 or 20 threads running at once. Increase or decrease this depending on server load, your bandwidth, etc. There's also the issue of concurrent connections to the MySQL database. Depending on how your tables are set up and how large they are, trying to insert too much data at the same time isn't going to work very well.

Upvotes: 7

Bernard
Bernard

Reputation: 17281

With Ruby 1.8, it's practically limited to how much memory you have. You can create tens of thousands of thread per process. The Ruby interpreter handles the management of the threads and only one or two native thread are created. It isn't true multitasking where the CPU switches between threads.

Ruby 1.9 uses native threads. The limit seems to be what is allowed by the OS. Just for testing, I can create over 2000 threads on my mac with Ruby 1.9 before the OS disallows any more.

Note that having thousands of threads for a process isn't a good idea. Thread scheduling becomes a burden long before that.

Upvotes: 5

Related Questions