Reputation: 7803
If not whats the maximum while still remaining efficient?
I'm creating 14 threads, each of which opens a list of URLs(about 500) creates a new thread for each one, which then downloads it, and adds it to a MySQL db. The MySQL pool size is set to 50.
This is a rake task in RoR.
Would this work better using Kernal#fork
or some other method?
Upvotes: 16
Views: 14866
Reputation: 8424
require 'open-uri'
a = 'http://www.example.com ' * 30
arr = a.split(' ')
arr.each_slice(3) do |group|
group.map do |site|
Thread.new do
open(site)
p 'finished'
end
end.each(&:join)
end
Upvotes: 18
Reputation: 8116
Well, since your threads are going to be IO bound, the good news is that both Ruby 1.8 and 1.9 threads will work for this. Ruby 1.8 uses "userspace threads," meaning no real new OS threads are created when you create new threads in Ruby. This is bad for CPU multitasking, since only one Ruby thread is actually running at a time, but good for IO multitasking. Ruby 1.9 uses real threads, and will be good for either.
The number of threads you can create really depends on your system. There are of course practical limits, but you probably don't want to get anywhere near them. First, unless the servers you're downloaidng from are very slow and your connection is very fast, just a few threads is going to saturate your Internet connection. Also, if you're grabbing a lot of pages from a single server, throwing 500 requests at it at once from 500 threads isn't going to do any good either.
I'd start pretty small: 10 or 20 threads running at once. Increase or decrease this depending on server load, your bandwidth, etc. There's also the issue of concurrent connections to the MySQL database. Depending on how your tables are set up and how large they are, trying to insert too much data at the same time isn't going to work very well.
Upvotes: 7
Reputation: 17281
With Ruby 1.8, it's practically limited to how much memory you have. You can create tens of thousands of thread per process. The Ruby interpreter handles the management of the threads and only one or two native thread are created. It isn't true multitasking where the CPU switches between threads.
Ruby 1.9 uses native threads. The limit seems to be what is allowed by the OS. Just for testing, I can create over 2000 threads on my mac with Ruby 1.9 before the OS disallows any more.
Note that having thousands of threads for a process isn't a good idea. Thread scheduling becomes a burden long before that.
Upvotes: 5