Reputation: 271
I have a webpage which requires a login.
I am using curl to build the HTTP authentication request. It works, but I am not able to grab all the content from my links. I miss all the images.
How can I grab the images as well?
<?php
// create cURL resource
$URL = "http://10.123.22.38/nagios/nagvis/nagvis/index.php?map=Nagvis_CC";
//Initl curl
$ch = curl_init();
//Set HTTP authentication option
curl_setopt($ch, CURLOPT_URL, $URL); // Load in the destination URL
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC); //Normal HTTP request, not SSL
curl_setopt($ch, CURLOPT_USERPWD, "guest:test" ); // Pass the user name and password
// grab URL and pass it to the browser
$content = curl_exec($ch);
$result = curl_getinfo($ch);
// close cURL resource, and free up system resources
curl_close($ch);
echo $content;
echo $result;
?>
I'm getting this warning message Warning: curl_error(): 2 is not a valid cURL handle resource in C:\xampp\htdocs\LiveServices\LoginTest.php on line 24
Upvotes: 1
Views: 3791
Reputation: 271
Here the some html codes: The images that I want to get:
<img id="backgroundImage" style="z-index: 0;" src="/nagios/nagvis/nagvis/images/maps/Nagvis_CC.png"/>
<a href="/nagios/cgi-bin/extinfo.cgi?type=2&host=business_processes&service=NLThirdPartyLive" target="_self">
And a lot of javascript.
I tried to use simple HTML dom libray, but the output is array. nothing
require("/simplehtmldom/simple_html_dom.php");
$ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, 'WhateverBrowser1.45'); curl_setopt($ch, CURLOPT_URL, 'http://10.123.22.38/nagios/nagvis/nagvis/index.php?map=Nagvis_CC'); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC); //Normal HTTP request, not SSL curl_setopt($ch, CURLOPT_USERPWD, "guest:test" ); // Pass the user name and password curl_setopt ($ch, CURLOPT_TIMEOUT, 60); $result = curl_exec($ch);
$html= str_get_html($result); echo $ret= $html->find('table[class=header_table]');
echo $result;
Upvotes: -1
Reputation: 124878
cURL doesn't get images or any other 'content', it just gets the raw HTML page. Are you saying you are missing <img />
tags that are present on the original page?
cURL also doesn't parse any CSS or JavaScript, so if the content is modified with those, it won't come through. For example, you may be unable to get a background-image
of an element unless you do more scraping, that is, get the associated CSS file and parse that.
Upvotes: 2
Reputation: 55465
The main issue I have is that I cannot see the html, so I cannot be sure what the problem is. Having said that, two things occur to me.
The first thing to check is if the images are relative or not. If they are displayed in the form ../xyz/foo.jpg
or foo.jpg
then you will either need to edit the images src to the full url or add the base tag to the html
For parsing HTML, use the Simple HTML DOM library as it is faster than rolling your own.
The second issue may be that the images also require the user to be logged in. If this is the case you would also have to download all the images, and either embed them in the content after base 64 encoding them, or store them temporally on your server.
Upvotes: 0