avukadomusic
avukadomusic

Reputation:

PHP code scraping a URL suddenly stopped working

$url = 'the web address I want to get the first and second numbers close to $' ;
$str = file_get_contents($url);

preg_match_all('/ ([$]) *(\d+(:?.\d+)?)/', $str, $matches, PREG_SET_ORDER);

$i=0;
foreach ($matches as $val) {
    if($i==0) $first=$val[2] ;
    if($i==3) $second=$val[2] ;
    $i++;    
}
$bad_symbols = array(",", "."); $first = str_replace($bad_symbols, "", $first); 
$bad_symbols = array(",", "."); $second = str_replace($bad_symbols, "", $second); 

echo $first . "</br>";
echo $second;

it worked fine till yesterday what could be the problem?

Upvotes: 0

Views: 263

Answers (2)

Pascal MARTIN
Pascal MARTIN

Reputation: 401022

I see at least two possible explanations :

  • The HTML of the site has changed ; maybe only a little bit -- but enough to get you in trouble.
    • You could test for the return value of preg_match_all
    • if it's false, it means your regex didn't match -- which may indicate the content of the HTML pageis not the same...
    • Then, you might have to modify your regex
  • The admin of the server (or it can be done in the code generating the page) has banned you
    • Maybe the website has detected it was scraped by you (either because you were going too hard on their server, or they saw their content on your site)
    • And they banned your IP (for instance)
    • To detect that, try to get the return value of file_get_contents ; if it's false, it might be the cause of the problem
    • Can you try getting that HTML page from your server, using wget in command-line ?
  • A third one, as suggestd by others : maybe the configuration of your server has changed, and you can't use file_get_content over HTTP anymore...
    • A solution would be to use curl, for instance
    • Check in your configuration the allow_url_fopen directive

If you activate error_reporting (see also), you might also get some informations that could prove usefull...

Upvotes: 3

usoban
usoban

Reputation: 5478

Maybe system administrator has changed allow_url_fopen directive, that means you can't access files that are not on your server. Check what file_get_contents() returns, because you gave us very little information about error.

Another problem, as mentioned above, could be that remote site has been changed :)

Upvotes: 0

Related Questions