Reputation: 271

Curl grab HTML content

I have a webpage which requires a login.

I am using curl to build the HTTP authentication request. It works, but I am not able to grab all the content from my links. I miss all the images.

How can I grab the images as well?

<?php

// create cURL resource
$URL = "http://10.123.22.38/nagios/nagvis/nagvis/index.php?map=Nagvis_CC";
//Initl curl
$ch = curl_init();

//Set HTTP authentication option
curl_setopt($ch, CURLOPT_URL, $URL);  // Load in the destination URL
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC); //Normal HTTP request, not SSL
curl_setopt($ch, CURLOPT_USERPWD, "guest:test" ); // Pass the user name and password

// grab URL and pass it to the browser
$content = curl_exec($ch);

$result = curl_getinfo($ch);
// close cURL resource, and free up system resources
curl_close($ch);

echo $content;
echo $result;

?>

I'm getting this warning message Warning: curl_error(): 2 is not a valid cURL handle resource in C:\xampp\htdocs\LiveServices\LoginTest.php on line 24

Upvotes: 1

Answers (3)

QLiu

Reputation: 271

Here the some html codes: The images that I want to get:

<img id="backgroundImage" style="z-index: 0;" src="/nagios/nagvis/nagvis/images/maps/Nagvis_CC.png"/>

<a href="/nagios/cgi-bin/extinfo.cgi?type=2&host=business_processes&service=NLThirdPartyLive" target="_self">

And a lot of javascript.

I tried to use simple HTML dom libray, but the output is array. nothing

require("/simplehtmldom/simple_html_dom.php");

$ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, 'WhateverBrowser1.45'); curl_setopt($ch, CURLOPT_URL, 'http://10.123.22.38/nagios/nagvis/nagvis/index.php?map=Nagvis_CC'); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC); //Normal HTTP request, not SSL curl_setopt($ch, CURLOPT_USERPWD, "guest:test" ); // Pass the user name and password curl_setopt ($ch, CURLOPT_TIMEOUT, 60); $result = curl_exec($ch);

$html= str_get_html($result); echo $ret= $html->find('table[class=header_table]');

echo $result;

Upvotes: -1

Tatu Ulmanen

Reputation: 124878

cURL doesn't get images or any other 'content', it just gets the raw HTML page. Are you saying you are missing <img /> tags that are present on the original page?

cURL also doesn't parse any CSS or JavaScript, so if the content is modified with those, it won't come through. For example, you may be unable to get a background-image of an element unless you do more scraping, that is, get the associated CSS file and parse that.

Upvotes: 2

Yacoby

Reputation: 55465

The main issue I have is that I cannot see the html, so I cannot be sure what the problem is. Having said that, two things occur to me.

The first thing to check is if the images are relative or not. If they are displayed in the form ../xyz/foo.jpg or foo.jpg then you will either need to edit the images src to the full url or add the base tag to the html

For parsing HTML, use the Simple HTML DOM library as it is faster than rolling your own.

The second issue may be that the images also require the user to be logged in. If this is the case you would also have to download all the images, and either embed them in the content after base 64 encoding them, or store them temporally on your server.

Upvotes: 0

Curl grab HTML content

Answers (3)

Related Questions