Reputation: 58
I try to retrieve info from a webpage using simple_html_dom, like this:
<?PHP
include_once('dom/simple_html_dom.php');
$urlpart="http://w2.brreg.no/motorvogn/";
$url = "http://w2.brreg.no/motorvogn/heftelser_motorvogn.jsp?regnr=BR15597";
$html = file_get_html($url);
foreach($html->find('a') as $element)
if(preg_match('*dagb*',$element)) {
$result=$urlpart.$element->href;
$resultcontent=file_get_contents($result);
echo $resultcontent;
}
?>
The $result variable first gives me this URL: http://w2.brreg.no/motorvogn/dagbokutskrift.jsp?dgbnr=2011365320&embnr=0®nr=BR15597
When accessing the above URL with my browser, i get the content i expect.
When retrieving the content with $resultcontent, i get a different result, where it says in norwegian "Invalid input".
Any ideas why?
Upvotes: 2
Views: 173
Reputation: 5689
The problem is with your URL query parameter.
http://w2.brreg.no/motorvogn/dagbokutskrift.jsp?dgbnr=2011365320&embnr=0®nr=BR15597
The string '®' in the URL will be converted to Symbol ® in file_get_contents function which stops you from getting the actual result.
You can use html_entity_decode
function in line #11
$resultcontent=file_get_contents(html_entity_decode($result));
Upvotes: 1
Reputation: 4202
foreach($html->find('a') as $element)
if(preg_match('*dagb*',$element)) {
$result=$urlpart.$element->href;
$resultcontent=file_get_contents(html_entity_decode($result));
echo $resultcontent;
}
This should do the trick.
Upvotes: 1