Reputation: 29
This code was working for days until it stopped working at the worst possible time. It simply pulls weather alert information from a NOAA website and displays it on my page. Can someone please tell me why this would suddenly fail?
$file = file_get_contents("http://forecast.weather.gov/showsigwx.php?warnzone=ARZ018&warncounty=ARC055");
preg_match_all('#<div id="content">([^`]*?)<\/div>#', $file, $matches);
$content = $matches[1];
echo "content = ".$content."</br>" ;
echo "matches = ".$matches."</br>" ;
print_r ($matches); echo "</br>";
echo "file </br>".$file."</br></br>" ;
Now all I get is an empty array.
This is the output..
content = Array
matches = Array
Array ( [0] => Array ( ) [1] => Array ( ) )
file = the full page as requested by file_get_contents
Upvotes: 1
Views: 235
Reputation: 50368
Your regexp is trying to match the literal string <div id="content">
, followed by some (as few as possible) chars that are not backticks (`
), followed by the literal string </div>
.
However, in the current set of NOAA warnings and advisories, there is a backtick between <div id="content">
and </div>
:
A SLIGHT RISK FOR SEVERE THUNDERSTORMS IS IN EFFECT FOR NORTHEAST MISSISSIPPI SOUTH OF A CALHOUN CITY TO FULTON MISSISSIPPI LINE FROM LATE THIS AFTERNOON THROUGH THIS EVENING. DAMAGING WINDS WILL BE THE MAIN THREAT...HOWEVER AN ISOLATED TORNADO CAN`T BE RULED OUT.
That's why your regexp doesn't match.
The simplest "fix" would be to replace the regexp with, say:
'#<div id="content">(.*?)<\/div>#s'
where .
will, with the s
modifier, match any character.
However, what you really should do is use a proper HTML parser to extract the text, instead of trying to parse HTML with regexps.
Edit: Here's a quick example (untested!) of how you could do this with DOMDocument:
$html = file_get_contents( $url );
$doc = new DOMDocument();
$doc->loadHTML( $html );
$content = $doc->getElementById( 'content' )->textContent;
or even just:
$doc = new DOMDocument();
$doc->loadHTMLFile( $url );
$content = $doc->getElementById( 'content' )->textContent;
Upvotes: 6