Reputation: 2272
I own a DEDICATED server and installed CENTOS 7 with installed WEB Panel and PHP Version 8.1.13
Everything was working fine from last 12 days until I tried to created a simple web crawl script to run .
<?php
include ("simple_html_dom.php");
$html =file_get_html("https://www.bbc.com");
echo $html;
foreach($html-> find("div li") as $h)
{
echo $h-> text();
}
?>
It gave me an error
Warning: file_get_contents(): php_network_getaddresses: getaddrinfo for www.bbc.com failed: Name or service not known in home/myserver/public_html/news/simple_html_dom.php on line 82
Warning: file_get_contents(https://www.bbc.com): Failed to open stream: php_network_getaddresses: getaddrinfo for www.bbc.com failed: Name or service not known in /home/myserver/public_html/news/simple_html_dom.php on line 82
Fatal error: Uncaught Error: Call to a member function find() on bool in home/myserver/public_html/news/tim.php:6 Stack trace: #0 {main} thrown in /home/myserver/public_html/news/tim.php on line 6
I also looked for the logs on the server . It shows for many other domains too
To find the solution , I looked upon many videos tutorial and read several posts. What I know that this problem is due to mis configuration on DNS server side. I looked upon solution but none of the them worked me . I would be very thankful if someone please guide me to fix this DNS error.
I have already checked the basic configuration on server side and it looked very fine to me. Moreover I am very cautious to break the existing configuration.
Upvotes: -1
Views: 812
Reputation: 21
It seems like your IP address was blocked or temporary restricted. Crawl process way complex than just getting data by file_get_contents. You should take care about headers, cookies, sessions maybe. Just to be like a normal user.
Also, you can use public API. Usually big sites like BBC have it to avoid big load to HTTP channel. For example: https://apitracker.io/a/bbc-news
Also, you can subscribe to RSS feed of BBC and check updates by it. It will be way easier to organise downloading content. For example: https://gist.github.com/mburst/5230448
Upvotes: 1