johnlemon
johnlemon

Reputation: 21449

How to download a file using curl in php?

How can I use Curl to download a file in PHP if the headers are set to true? can I also get the filename and extension of file?

Example PHP code:

curl_setopt ($ch, CURLOPT_HEADER, 1);
$fp = fopen($strFilePath, 'w');
curl_setopt($ch, CURLOPT_FILE, $fp);

Upvotes: 10

Views: 22622

Answers (5)

hanshenrik
hanshenrik

Reputation: 21465

when you say

if the headers are set to true?

i'll assume you mean if CURLOPT_HEADER is set to true

several ways to do it, my personal favorite is to use CURLOPT_HEADERFUNCTION instead of CURLOPT_HEADER, but that does not, strictly speaking, answer your question. if you for some reason are absolutely adamant about using CURLOPT_HEADER, you can separate the body and the headers with strpos()+substr(),

eg:

<?php
declare(strict_types = 1);
$ch= curl_init();
curl_setopt_array($ch,array(
    CURLOPT_URL=>'http://example.org',
    CURLOPT_HEADER=>1,
    CURLOPT_RETURNTRANSFER=>1
));
$response = curl_exec($ch);
$header_body_separator = "\r\n\r\n";
$header_body_separator_position = strpos($response, $header_body_separator);
$separator_found = true;
if($header_body_separator_position === false){
    // no body is present?
    $header_body_separator_position = strlen($response);
    $separator_found = false;
}
$headers = substr($response,0, $header_body_separator_position);
$headers = trim($headers);
$headers = explode("\r\n",$headers);
$body = ($separator_found ? substr($response, $header_body_separator_position + strlen($header_body_separator)) : "");
var_export(["headers"=>$headers,"body"=>$body]);die();

gives you

array (
  'headers' => 
  array (
    0 => 'HTTP/1.1 200 OK',
    1 => 'Age: 240690',
    2 => 'Cache-Control: max-age=604800',
    3 => 'Content-Type: text/html; charset=UTF-8',
    4 => 'Date: Fri, 06 Nov 2020 09:47:18 GMT',
    5 => 'Etag: "3147526947+ident"',
    6 => 'Expires: Fri, 13 Nov 2020 09:47:18 GMT',
    7 => 'Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT',
    8 => 'Server: ECS (nyb/1D20)',
    9 => 'Vary: Accept-Encoding',
    10 => 'X-Cache: HIT',
    11 => 'Content-Length: 1256',
  ),
  'body' => '<!doctype html>
<html>
<head>
    <title>Example Domain</title>
(...capped)

but i don't recommend this approach, i don't recommend using CURLOPT_HEADER at all. instead i recommend using CURLOPT_HEADERFUNCTION, eg:

<?php
declare(strict_types = 1);
$ch = curl_init();
$headers = [];
curl_setopt_array($ch, array(
    CURLOPT_URL => 'http://example.org',
    CURLOPT_HEADERFUNCTION => function ($ch, string $header) use (&$headers): int {
        $header_trimmed = trim($header);
        if (strlen($header_trimmed) > 0) {
            $headers[] = $header_trimmed;
        }
        return strlen($header);
    },
    CURLOPT_RETURNTRANSFER => 1
));
$body = curl_exec($ch);
var_export([
    "headers" => $headers,
    "body" => $body
]);

wich gives you the exact same result with much simpler code:

array (
  'headers' => 
  array (
    0 => 'HTTP/1.1 200 OK',
    1 => 'Age: 604109',
    2 => 'Cache-Control: max-age=604800',
    3 => 'Content-Type: text/html; charset=UTF-8',
    4 => 'Date: Fri, 06 Nov 2020 09:50:32 GMT',
    5 => 'Etag: "3147526947+ident"',
    6 => 'Expires: Fri, 13 Nov 2020 09:50:32 GMT',
    7 => 'Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT',
    8 => 'Server: ECS (nyb/1D2E)',
    9 => 'Vary: Accept-Encoding',
    10 => 'X-Cache: HIT',
    11 => 'Content-Length: 1256',
  ),
  'body' => '<!doctype html>
<html>
<head>
(capped)

another option is CURLINFO_HEADER_OUT, but i don't recommend using CURLINFO_HEADER_OUT in PHP because it's fking bugged: https://bugs.php.net/bug.php?id=65348

Upvotes: 2

Attila Antal
Attila Antal

Reputation: 821

I belie you have found your answer by now. However, I'd like to share my script that works well by sending a json request to a server which returns the file in binary, then it downloads on the fly. Saving is not necessary. Hope it helps!

NOTE: You can avoid converting the post data to json.

<?php

// Username or E-mail
$login = 'username';
// Password
$password = 'password';
// API Request
$url = 'https://example.com/api';
// POST data
$data = array('someTask', 24);
// Convert POST data to json
$data_string = json_encode($data);
// initialize cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_USERPWD, "$login:$password");
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_string);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

// Execute cURL and store the response in a variable
$file = curl_exec($ch);

// Get the Header Size
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
// Get the Header from response
$header = substr($file, 0, $header_size);
// Get the Body from response
$body = substr($file, $header_size);
// Explode Header rows into an array
$header_items = explode("\n", $header);
// Close cURL handler
curl_close($ch);

// define new variable for the File name
$file_name = null;

// find the filname in the headers.
if(!preg_match('/filename="(.*?)"/', $header, $matches)){
    // If filename not found do something...
    echo "Unable to find filename.<br>Please check the Response Headers or Header parsing!";
    exit();
} else {
    // If filename was found assign the name to the variable above 
    $file_name = $matches[1];
}
// Check header response, if HTTP response is not 200, then display the error.
if(!preg_match('/200/', $header_items[0])){
    echo '<pre>'.print_r($header_items[0], true).'</pre>';
    exit();
} else {
    // Check header response, if HTTP response is 200, then proceed further.

    // Set the header for PHP to tell it, we would like to download a file
    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Content-Transfer-Encoding: binary');
    header('Expires: 0');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    header('Content-Disposition: attachment; filename='.$file_name);

    // Echo out the file, which then should trigger the download
    echo $file;
    exit;
}

?>

Upvotes: 6

Mel
Mel

Reputation: 6157

Below is a complete example that uses a class. The header parsing is more elaborate then it can be, cause I was laying the base for full hierarchial header storage.

I just noticed init() should reset a lot more variables if it wants to be possible to reuse the instance for more URL's, but this should at least give you a base of how to download a file to a filename sent by the server.

<?php
/*
 * vim: ts=4 sw=4 fdm=marker noet tw=78
 */
class curlDownloader
{
    private $remoteFileName = NULL;
    private $ch = NULL;
    private $headers = array();
    private $response = NULL;
    private $fp = NULL;
    private $debug = FALSE;
    private $fileSize = 0;

    const DEFAULT_FNAME = 'remote.out';

    public function __construct($url)
    {
        $this->init($url);
    }

    public function toggleDebug()
    {
        $this->debug = !$this->debug;
    }

    public function init($url)
    {
        if( !$url )
            throw new InvalidArgumentException("Need a URL");

        $this->ch = curl_init();
        curl_setopt($this->ch, CURLOPT_URL, $url);
        curl_setopt($this->ch, CURLOPT_HEADERFUNCTION,
            array($this, 'headerCallback'));
        curl_setopt($this->ch, CURLOPT_WRITEFUNCTION,
            array($this, 'bodyCallback'));
    }

    public function headerCallback($ch, $string)
    {
        $len = strlen($string);
        if( !strstr($string, ':') )
        {
            $this->response = trim($string);
            return $len;
        }
        list($name, $value) = explode(':', $string, 2);
        if( strcasecmp($name, 'Content-Disposition') == 0 )
        {
            $parts = explode(';', $value);
            if( count($parts) > 1 )
            {
                foreach($parts AS $crumb)
                {
                    if( strstr($crumb, '=') )
                    {
                        list($pname, $pval) = explode('=', $crumb);
                        $pname = trim($pname);
                        if( strcasecmp($pname, 'filename') == 0 )
                        {
                            // Using basename to prevent path injection
                            // in malicious headers.
                            $this->remoteFileName = basename(
                                $this->unquote(trim($pval)));
                            $this->fp = fopen($this->remoteFileName, 'wb');
                        }
                    }
                }
            }
        }

        $this->headers[$name] = trim($value);
        return $len;
    }
    public function bodyCallback($ch, $string)
    {
        if( !$this->fp )
        {
            trigger_error("No remote filename received, trying default",
                E_USER_WARNING);
            $this->remoteFileName = self::DEFAULT_FNAME;
            $this->fp = fopen($this->remoteFileName, 'wb');
            if( !$this->fp )
                throw new RuntimeException("Can't open default filename");
        }
        $len = fwrite($this->fp, $string);
        $this->fileSize += $len;
        return $len;
    }

    public function download()
    {
        $retval = curl_exec($this->ch);
        if( $this->debug )
            var_dump($this->headers);
        fclose($this->fp);
        curl_close($this->ch);
        return $this->fileSize;
    }

    public function getFileName() { return $this->remoteFileName; }

    private function unquote($string)
    {
        return str_replace(array("'", '"'), '', $string);
    }
}

$dl = new curlDownloader(
    'https://dl.example.org/torrent/cool-movie/4358-hash/download.torrent'
);
$size = $dl->download();
printf("Downloaded %u bytes to %s\n", $size, $dl->getFileName());
?>

Upvotes: 4

Daniel Stenberg
Daniel Stenberg

Reputation: 58002

To get both headers and data, separately, you typically use both a header callback and a body callback. Like in this example: http://curl.haxx.se/libcurl/php/examples/callbacks.html

To get the file name from the headers, you need to check for a Content-Disposition: header and extract the file name from there (if present) or just use the file name part from the URL or similar. Your choice.

Upvotes: 1

Sujit Agarwal
Sujit Agarwal

Reputation: 12508

Download file or web page using PHP cURL and save it to file

<?php
/**
* Initialize the cURL session
*/
$ch = curl_init();
/**
* Set the URL of the page or file to download.
*/
curl_setopt($ch, CURLOPT_URL,
'http://news.google.com/news?hl=en&topic=t&output=rss');
/**
* Create a new file
*/
$fp = fopen('rss.xml', 'w');
/**
* Ask cURL to write the contents to a file
*/
curl_setopt($ch, CURLOPT_FILE, $fp);
/**
* Execute the cURL session
*/
curl_exec ($ch);
/**
* Close cURL session and file
*/
curl_close ($ch);
fclose($fp);
?>

Upvotes: 7

Related Questions