ken kaneki
ken kaneki

Reputation: 21

how to download pdf files from url of website using php

I am trying to make a software that will get user keyword and search it on google, find all the sites that have pdf files against that word and download them. i was able to get html of google search result against keyword, but those html links are not of any use and i can't download pdf files from them.

<?php
if(isset($_POST['submit'])){

    $endpoint =$_POST['info'];
    $endpoint = str_replace(' ', '+', $endpoint);
    $endpoint= $endpoint.'+pdf';
    $page = file_get_contents('https://www.google.com.pk/search?dcr=0&source=hp&q='.$endpoint.'&oq='.$endpoint.'&gs_l=psy-ab.3..35i39k1l2j0j0i131k1j0l3j0i131k1j0l2.73519.74668.0.75122.9.7.0.0.0.0.424.424.4-1.1.0....0...1.1.64.psy-ab..8.1.422.0...0.U3V3CxpsqhA');

    $dom = new DOMDocument;

    @$dom->loadHTML($page);

    $links = $dom->getElementsByTagName('a');
    foreach ($links as $link){
        echo $link->nodeValue;
        echo $link->getAttribute('href'), '<br>';
    }

}
?>

this is what i have to get html of google search result. i am kind of stuck here, Please guide me what should i do now.

Upvotes: 0

Views: 14699

Answers (2)

Andrea Golin
Andrea Golin

Reputation: 3559

I think you should request the file at the link you just crawled with the correct header:

<?php 
header("Content-type:application/pdf");
header("Content-Disposition:attachment;filename='downloaded.pdf'");

Or use cURL. Note that header() must be called before any other output, so maybe you could divide your app flow in two/three steps:

  1. Google the keyword
  2. Present the user a list of possible matches
  3. Let the user choose which one to download (and fire up the request with the content-type header)

check this other answer: https://stackoverflow.com/a/20080402/3279175

Upvotes: 2

Claudio
Claudio

Reputation: 5203

Try using file_put_contents and fopen:

$url = 'http:// ... ';
file_put_contents('file.pdf', fopen($url, 'r'));

Upvotes: 1

Related Questions