Reputation: 1625
Hi I am basically trying to fetch a page via php, get its html and change the html(to highlight some keywords) to a bit and display it as a overlay in my page(jquery).
//My php page data.php
<?php
$html= file_get_contents($_GET['url']);
echo $html;
?>
//My jquery ajax request to data.php from page main.html
function test()
{
$.ajax({
type: 'GET',
url: 'data.php',
data: 'url=http://www.developphp.com/view_lesson.php?v=338',
cache: false,
success: function(result)
{
$("#overlay").append(result);
}
});
}
}
As you can see, since the webpage uses relative URL, I am having issues displaying it in a overlay. I tried searching for a way to convert relative to absolute but did not find anything useful. Can you guys please point me in the right way?
Upvotes: 2
Views: 2608
Reputation: 1625
With all your help, I did something like this,
Instead of trying to replace the relative by absolute path, I appended the base url html tag to the scrapped content.
<?php
include 'URL2.php';
error_reporting(0); //suppress DOM errors
$content=file_get_contents($_GET['fullURL']); //http://somewebsite.com/page1.html
$url = new Net_URL2($_GET['fullURL']);
$baseURL= $url->host; //http://somewebsite.com
if(strpos($baseURL,'http://')<0)
{
$baseURL='http://'.$baseURL;
}
$dom=new DomDocument();
$dom->loadHTML($content);
$head = $dom->getElementsByTagName('head')->item(0);
$base = $dom->createElement('base');
$base->setAttribute('href',$_GET['baseURL']);
if ($head->hasChildNodes()) {
$head->insertBefore($base,$head->firstChild);
} else {
$head->appendChild($base);
}
echo $dom->saveHTML();
?>
Upvotes: 0
Reputation: 85518
I ilke @charlietfl's solution. However, somehow I think it gives more sense to manipulate the scraped content serverside before passing it to the client. You can do that by using DomDocument.
The following code converts all <img>
src
relative paths to absolute paths before echoing the result. Use the same approch for the <a>
tags href
attributes and so on,
error_reporting(0); //suppress DOM errors
$basePath='http://www.developphp.com/'; //use parse_url to get the basepath dynamically
$content=file_get_contents('http://www.developphp.com/view_lesson.php?v=338');
$dom=new DomDocument();
$dom->loadHTML($content);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$src=$image->attributes->getNamedItem("src")->value;
if (strpos($basePath, $src)<=0) {
$image->attributes->getNamedItem("src")->value=$basePath.$src;
}
}
echo $dom->saveHTML();
Upvotes: 1
Reputation: 171679
Can start here
function test(){
var domain='http://www.developphp.com/', path= 'view_lesson.php?v=338';
$.ajax({
type: 'GET',
url: 'data.php',
data: { url: domain + path},
cache: false,
success: function(result)
{
var $html=updatePaths( $(result) );
$("#overlay").append($html);
}
});
}
function updatePaths( $html, domain){
/* loop over all images and adjust src*/
$html.find('img').attr(src,function(i, src){
if(src.indexOf(domain) ==-1){
src= domain+src
}
return src;
})
/* return updated jQuery object*/
return $html;
}
This will only work for simplest case where remote site isn't using a variation of the domain you use like not using www
and you do. Also won't work if image paths are set usng ../
to move up a directory.
You would have to create a far more robust set of tests to manipulate the final path you use correctly.
My intent was to show you how to manage situation
Upvotes: 2