Reputation: 5477
So I'm running a localhost test server with Windows 7 and Xampp. I'm working on a web crawler that will crawl the web, but when I open it up in my browser I get the Premature end of script headers error. I thought I got this from not including "print "Content-Type: text/html\n\n";" which is generally the issue.. but it wasn't.
This is the code I'm using:
#!"\xampp\perl\bin\perl.exe"
print "Content-Type: text/html\n\n";
use strict;
use warnings;
use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use HTML::LinkExtor;
open my $file1,"+>>", ("links.txt");
select($file1);
my @urls = ('http://www.youtube.com/');
my $browser = LWP::UserAgent->new('IE 6');
$browser->timeout(10);
while (@urls) {
my $url = shift @urls;
my $request = HTTP::Request->new(GET => $URL);
my $response = $browser->request($request);
if ($response->is_error()) {printf "%s\n", $response->status_line;}
my $contents = $response->content();
my ($page_parser) = HTML::LinkExtor->new(undef, $url);
$page_parser->parse($contents)->eof;
@links = $page_parser->links;
foreach $link (@links) {
push @urls, $$link[2]; # Add link to list of urls before printing it
print "$$link[2]\n";
}
sleep 60;
}
Upvotes: 1
Views: 3039
Reputation: 479
At first glance, the code you posted above contains several errors which prevent it from being executed: first $URL
, @links
and $link
are not declared (remember that you are under strict
). Then another problem is that LWP::UserAgent->new()
doesn't accept an odd number of arguments (since it requires an hash).
Since the error you get can just mean that the script stopped before it returned any output to the web server, the reason could be just those errors.
It can be helpful to run your script from the command line first, just to check it returns anything.
UPDATE
Yes, just by correcting the above mentioned errors your script seems to work (on Linux, from the command line). It still produces several warnings (and performs some unnecessary operations), which should be eliminated as well.
Upvotes: 4
Reputation: 385506
I thought I got this from not including print "Content-Type: text/html\n\n";
Not exactly. You didn't demonstrate that the print
got run, and you didn't demonstrate the print
got run before other output.
A compile-time error surely happened, in which case the print
statement never got executed. Check your web server's error log for the actual error.
Upvotes: 5