srchulo
srchulo

Reputation: 5203

perl threads exiting abnormally

I'm using perl's threads module with a simple crawler I'm working on so I can download pages in parallel. Ocasionally, I get error messages like these:

Thread 7 terminated abnormally: read timeout at /usr/lib64/perl5/threads.pm line 101.
Thread 15 terminated abnormally: Can't connect to burgundywinecompany.com:80 (connect: timeout) at /usr/lib64/perl5/threads.pm line 101.
Thread 19 terminated abnormally: write failed: Connection reset by peer at /usr/lib64/perl5/threads.pm line 101.

When I run the script linearly without threads, I do not encounter these errors. And these errors almost seem like they are from the LWP::UserAgent module, but they do not seem like they should be causing the threads to exit abnormally. Is there some extra precaution I have to take while using perl's threads? Thanks!

UPDATE:

I have tracked down the source of these abnormal terminations, and it does seem to be whenever I make a request using LWP::UserAgent. If I remove the method call to download the webpage, then the errors stop.

Sample Script

The script below causes one error I'm speaking of. The last URL will timeout, resulting in what should just be part of the HTTP::Repsonse object to instead cause a thread to terminate abnormally:

#!/usr/bin/perl
use threads;
use Thread::Queue;
use LWP::UserAgent;

my $THREADS=10; # Number of threads
                             #(if you care about them)
my $workq = Thread::Queue->new(); # Work to do

my @stufftodo = qw(http://www.collectorsarmoury.com/ http://burgundywinecompany.com/ http://beetreeminiatures.com/);

$workq->enqueue(@stufftodo); # Queue up some work to do
$workq->enqueue("EXIT") for(1..$THREADS); # And tell them when

threads->create("Handle_Work") for(1..$THREADS); # Spawn our workers

$_->join for threads->list;

sub Handle_Work {
    while(my $todo=$workq->dequeue()) {
        last if $todo eq 'EXIT'; # All done
        print "$todo\n";
        my $ua = LWP::UserAgent->new;
        my $RESP = $ua->get($todo);
    }
    threads->exit(0);
}

Upvotes: 3

Views: 4317

Answers (3)

RobEarl
RobEarl

Reputation: 7912

It looks like the get method is setting $@ even though it doesn't die. You can see it isn't dying by putting some prints after the get:

my $RESP = $ua->get($todo);
if($RESP->is_success) {
    print "$todo success\n";
} else {
    print "$todo failed: ".$RESP->status_line."\n";
}

You can see the print after a failed request still happens before the thread exits:

http://www.collectorsarmoury.com/ success
http://burgundywinecompany.com/ success
http://beetreeminiatures.com/ failed: 500 Can't connect to beetreeminiatures.com:80 (Connection timed out)
Thread 3 terminated abnormally: Can't connect to beetreeminiatures.com:80 (Connection timed out)

The thread exit then appears to pickup on $@ being set as abnormal. If you reset $@ before exiting the thread (or local $@ in Handle_Work, or eval around the get) the thread exits cleanly.

Upvotes: 2

amon
amon

Reputation: 57590

I played a bit with your source and came up with this:

#!/usr/bin/perl

use 5.012; use warnings;
use threads; use Thread::Queue; use LWP::UserAgent;

use constant THREADS => 10;

my $queue = Thread::Queue->new();
my @URLs =  qw( http://www.collectorsarmoury.com/
                http://burgundywinecompany.com/
                http://beetreeminiatures.com/       );
my @threads;

for (1..THREADS) {
    push @threads, threads->create(sub {
        my $ua = LWP::UserAgent->new;
        $ua->timeout(5); # short timeout for easy testing.
        while(my $task = $queue->dequeue) {
            my $response = eval{ $ua->get($task)->status_line };
            say "$task --> $response";
        }
    });
}

$queue->enqueue(@URLs);
$queue->enqueue(undef) for 1..THREADS;
# ... here work is done
$_->join foreach @threads;

Output:

http://www.collectorsarmoury.com/ --> 200 OK
http://burgundywinecompany.com/ --> 200 OK
http://beetreeminiatures.com/ --> 500 Can't connect to beetreeminiatures.com:80 (timeout)

Output without eval:

http://www.collectorsarmoury.com/ --> 200 OK
http://burgundywinecompany.com/ --> 200 OK
http://beetreeminiatures.com/ --> 500 Can't connect to beetreeminiatures.com:80 (timeout)
Thread 2 terminated abnormally: Can't connect to beetreeminiatures.com:80 (timeout)

LWP::Protocol::http::Socket: connect: timeout at /usr/share/perl5/LWP/Protocol/http.pm line 51.

Things I do differently are:

unimportant:

  • I don't exit my threads; I just drop of at the end (implicit return)
  • I allocate one User Agent per thread, not one per request.

better style:

  • I use undef to signal thread termination: Once a false value is dequeued, the loop condition is false anyway and the thread terminates. If you want to pass a special string to signal termination, you should loop with while (1), and dequeue inside the loop body.

important:

  • To silence those pesky errors, I eval'd the get. Should the request die, my thread doesn't follow suit but keeps calm and carries on.

Because getting the URL can actuall die. If we look at line 51 of the source of LWP::Protocol::http, we see that a fatal error will be raised if no socket can be created for the connection. This could happen when the hostname can't be resolved.

In my code I decided to ignore the error (as I already print the status line). Depending on the problem, you might want to retry the URL again, or give a more informative warning. See the linked source for a good example in error handling.

Unfortunately, I couldn't reproduce your exact errors (the line given in your warnings points to the threads->exit() class method). However in most cases, using eval should prevent abnormal termination.

Upvotes: 3

Darryl Miles
Darryl Miles

Reputation: 4631

Well perl does have a mechanism to abort and do fatal(). But I don't think this is the case for you.

If you take a look at threads.pl line 101 this maybe the thread exit method and using with a non-zero exit status might be considered an abnormal condition.

I think these things are harmless and the use of 'terminate abnormally' is just an indication that the operation was not 100% successful. This means you should plan and implement a recovery scenario for those threads whose operations did not complete.

To you the choice of words is alarming and causing concern, but if you change the message to read: "Thread 123 did not complete indicating success" it might seem less alarming and more in line with that is happening.

It is also better to allow the thread main method to return (deallocating data on the way if necessary). This is instead of using threads::exit, unless of course that is being done as the last thing in the main method.

With regards to forking, are you claiming it never fails when forking and does the forked process indicate failure with non-zero 'exit status'. Also are you sure you are not overloading the website(s), proxy, network, whatever, when using threads.

Upvotes: 0

Related Questions