Niranjan Sonachalam
Niranjan Sonachalam

Reputation: 1625

HTML-PHP Relative Path to Absolute Path

Hi I am basically trying to fetch a page via php, get its html and change the html(to highlight some keywords) to a bit and display it as a overlay in my page(jquery).

//My php page data.php
<?php
$html=  file_get_contents($_GET['url']);
echo $html;
?>

//My jquery ajax request to data.php from page main.html
function test()
{  

        $.ajax({
            type: 'GET',
            url: 'data.php',
            data: 'url=http://www.developphp.com/view_lesson.php?v=338',
            cache: false,
            success: function(result) 
            {           
                 $("#overlay").append(result);

            }
        });
    }

}

As you can see, since the webpage uses relative URL, I am having issues displaying it in a overlay. I tried searching for a way to convert relative to absolute but did not find anything useful. Can you guys please point me in the right way?

Upvotes: 2

Views: 2608

Answers (3)

Niranjan Sonachalam
Niranjan Sonachalam

Reputation: 1625

With all your help, I did something like this,

Instead of trying to replace the relative by absolute path, I appended the base url html tag to the scrapped content.

<?php
include 'URL2.php';
error_reporting(0); //suppress DOM errors
$content=file_get_contents($_GET['fullURL']);  //http://somewebsite.com/page1.html
$url = new Net_URL2($_GET['fullURL']);
$baseURL= $url->host; //http://somewebsite.com
if(strpos($baseURL,'http://')<0)
{
    $baseURL='http://'.$baseURL;
}
$dom=new DomDocument();
$dom->loadHTML($content);
$head = $dom->getElementsByTagName('head')->item(0);
$base = $dom->createElement('base');
$base->setAttribute('href',$_GET['baseURL']);

if ($head->hasChildNodes()) {
    $head->insertBefore($base,$head->firstChild);
} else {
    $head->appendChild($base);
}

echo $dom->saveHTML();
?>

Upvotes: 0

davidkonrad
davidkonrad

Reputation: 85518

I ilke @charlietfl's solution. However, somehow I think it gives more sense to manipulate the scraped content serverside before passing it to the client. You can do that by using DomDocument.

The following code converts all <img> src relative paths to absolute paths before echoing the result. Use the same approch for the <a> tags href attributes and so on,

error_reporting(0); //suppress DOM errors
$basePath='http://www.developphp.com/'; //use parse_url to get the basepath dynamically
$content=file_get_contents('http://www.developphp.com/view_lesson.php?v=338');
$dom=new DomDocument();
$dom->loadHTML($content);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
    $src=$image->attributes->getNamedItem("src")->value;
    if (strpos($basePath, $src)<=0) {
        $image->attributes->getNamedItem("src")->value=$basePath.$src;
    }
}
echo $dom->saveHTML();

Upvotes: 1

charlietfl
charlietfl

Reputation: 171679

Can start here

function test(){  
    var domain='http://www.developphp.com/', path= 'view_lesson.php?v=338';
    $.ajax({
            type: 'GET',
            url: 'data.php',
            data: { url: domain + path},
            cache: false,
            success: function(result) 
            {      
               var $html=updatePaths( $(result) );

                 $("#overlay").append($html);

            }
        });

}

function updatePaths( $html, domain){
  /* loop over all images and adjust src*/
  $html.find('img').attr(src,function(i, src){
    if(src.indexOf(domain) ==-1){
      src= domain+src
    }
    return src;
  })
  /* return updated jQuery object*/
  return $html;

}

This will only work for simplest case where remote site isn't using a variation of the domain you use like not using www and you do. Also won't work if image paths are set usng ../ to move up a directory.

You would have to create a far more robust set of tests to manipulate the final path you use correctly.

My intent was to show you how to manage situation

Upvotes: 2

Related Questions