Reputation: 8782
I recently found myself trying to diagnose why a particular Ruby program was running slow. In the end it turned out be caused by a scaling issue causing a lot of contention on a particular mutex.
I was wondering if there are any tools that I could have used to make this issue easier to diagnose? I know I could have used ruby-prof to get detailed output of what all 100+ threads of this program were spending their time on, but I'm curious whether there is any tool that is specifically focused on just measuring mutex contention in Ruby?
Upvotes: 1
Views: 197
Reputation: 8782
So If figured out how to do this with DTrace.
Given a Ruby program like this:
# mutex.rb
mutex = Mutex.new
threads = []
threads << Thread.new do
loop do
mutex.synchronize do
sleep 2
end
end
end
threads << Thread.new do
loop do
mutex.synchronize do
sleep 4
end
end
end
threads.each(&:join)
We can use a DTrace script like this:
/* mutex.d */
ruby$target:::cmethod-entry
/copyinstr(arg0) == "Mutex" && copyinstr(arg1) == "synchronize"/
{
self->file = copyinstr(arg2);
self->line = arg3;
}
pid$target:ruby:rb_mutex_lock:entry
/self->file != NULL && self->line != NULL/
{
self->mutex_wait_start = timestamp;
}
pid$target:ruby:rb_mutex_lock:return
/self->file != NULL && self->line != NULL/
{
mutex_wait_ms = (timestamp - self->mutex_wait_start) / 1000;
printf("Thread %d acquires mutex %d after %d ms - %s:%d\n", tid, arg1, mutex_wait_ms, self->file, self->line);
self->file = NULL;
self->line = NULL;
}
When we run this script against the Ruby program, we then get information like this:
$ sudo dtrace -q -s mutex.d -c 'ruby mutex.rb'
Thread 286592 acquires mutex 4313316240 after 2 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4004183 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004170 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 6 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 16012158 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2002593 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4001983 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004418 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4000407 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004163 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4003191 ms - mutex.rb:6
Thread 286591 acquires mutex 4313316240 after 2 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 4005587 ms - mutex.rb:14
...
We can collect this output and use it to derive information about which mutexes are causing the most contention.
Upvotes: 1