Reputation:
I'm making Torrent PHP Crawler and I have problem, here's my code:
// ... the cURL codes (they're working) ...
// Contents of the Page
$contents = curl_exec($crawler->curl);
// Find the Title
$pattern = "/<title>(.*?)<\/title>/s";
preg_match($pattern, $contents, $titlematches);
echo "Title - ".$titlematches[1]."<br/>";
// Find the Category
$pattern = "/Тип<\/td><td(?>[^>]+)>((?>[^<]+))<\/td>/s";
preg_match($pattern, $contents, $categorymatches);
echo "Category - ".$categorymatches[1]."<br/>";
The HTML page ("Тип" means Category and "Филми" means Movies):
<title>The Matrix</title>
<!--Some Codes Here--!>
<tr><td>Тип</td><td valign="top" align=left>Филми</td></tr>
<!--Some Codes Here--!>
The Result:
Title - The Matrix
Notice: Undefined offset: 1 in /var/www/spider.php on line 117
It's showing the title but not the category.. why is that?
I've tried to echo $categorymatches[0]
, $categorymatches[2]
, $categorymatches[3]
without any luck.
Upvotes: 3
Views: 8846
Reputation: 212452
You're assuming that preg_match actually finds a match. It's better to test if it did so.
$pattern = "/<title>(.*?)<\/title>/s";
$matchCount = preg_match($pattern, $contents, $titlematches);
if ($matchCount > 0) {
echo $titlematches[1]."<br/>";
} else {
// do something else, 'cos no match found
}
Note that you might want to use a switch or two with preg_match: this will only find a result if "title" is used, not "TITLE" or "Title", so using the case-insensitive /i switch might be an idea; or the tag might be on a different line to the value, and to the , so the multiline switch /m could be useful.
And the same principle applies to all your preg_match checks
EDIT
It looks as though your category match is testing for a utf-8 string, so try using the /u switch
Upvotes: 6