Reputation: 21449
How can I use Curl to download a file in PHP if the headers are set to true? can I also get the filename and extension of file?
Example PHP code:
curl_setopt ($ch, CURLOPT_HEADER, 1);
$fp = fopen($strFilePath, 'w');
curl_setopt($ch, CURLOPT_FILE, $fp);
Upvotes: 10
Views: 22622
Reputation: 21465
when you say
if the headers are set to true?
i'll assume you mean if CURLOPT_HEADER is set to true
several ways to do it, my personal favorite is to use CURLOPT_HEADERFUNCTION instead of CURLOPT_HEADER, but that does not, strictly speaking, answer your question. if you for some reason are absolutely adamant about using CURLOPT_HEADER, you can separate the body and the headers with strpos()+substr(),
eg:
<?php
declare(strict_types = 1);
$ch= curl_init();
curl_setopt_array($ch,array(
CURLOPT_URL=>'http://example.org',
CURLOPT_HEADER=>1,
CURLOPT_RETURNTRANSFER=>1
));
$response = curl_exec($ch);
$header_body_separator = "\r\n\r\n";
$header_body_separator_position = strpos($response, $header_body_separator);
$separator_found = true;
if($header_body_separator_position === false){
// no body is present?
$header_body_separator_position = strlen($response);
$separator_found = false;
}
$headers = substr($response,0, $header_body_separator_position);
$headers = trim($headers);
$headers = explode("\r\n",$headers);
$body = ($separator_found ? substr($response, $header_body_separator_position + strlen($header_body_separator)) : "");
var_export(["headers"=>$headers,"body"=>$body]);die();
gives you
array (
'headers' =>
array (
0 => 'HTTP/1.1 200 OK',
1 => 'Age: 240690',
2 => 'Cache-Control: max-age=604800',
3 => 'Content-Type: text/html; charset=UTF-8',
4 => 'Date: Fri, 06 Nov 2020 09:47:18 GMT',
5 => 'Etag: "3147526947+ident"',
6 => 'Expires: Fri, 13 Nov 2020 09:47:18 GMT',
7 => 'Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT',
8 => 'Server: ECS (nyb/1D20)',
9 => 'Vary: Accept-Encoding',
10 => 'X-Cache: HIT',
11 => 'Content-Length: 1256',
),
'body' => '<!doctype html>
<html>
<head>
<title>Example Domain</title>
(...capped)
but i don't recommend this approach, i don't recommend using CURLOPT_HEADER at all. instead i recommend using CURLOPT_HEADERFUNCTION, eg:
<?php
declare(strict_types = 1);
$ch = curl_init();
$headers = [];
curl_setopt_array($ch, array(
CURLOPT_URL => 'http://example.org',
CURLOPT_HEADERFUNCTION => function ($ch, string $header) use (&$headers): int {
$header_trimmed = trim($header);
if (strlen($header_trimmed) > 0) {
$headers[] = $header_trimmed;
}
return strlen($header);
},
CURLOPT_RETURNTRANSFER => 1
));
$body = curl_exec($ch);
var_export([
"headers" => $headers,
"body" => $body
]);
wich gives you the exact same result with much simpler code:
array (
'headers' =>
array (
0 => 'HTTP/1.1 200 OK',
1 => 'Age: 604109',
2 => 'Cache-Control: max-age=604800',
3 => 'Content-Type: text/html; charset=UTF-8',
4 => 'Date: Fri, 06 Nov 2020 09:50:32 GMT',
5 => 'Etag: "3147526947+ident"',
6 => 'Expires: Fri, 13 Nov 2020 09:50:32 GMT',
7 => 'Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT',
8 => 'Server: ECS (nyb/1D2E)',
9 => 'Vary: Accept-Encoding',
10 => 'X-Cache: HIT',
11 => 'Content-Length: 1256',
),
'body' => '<!doctype html>
<html>
<head>
(capped)
another option is CURLINFO_HEADER_OUT, but i don't recommend using CURLINFO_HEADER_OUT in PHP because it's fking bugged: https://bugs.php.net/bug.php?id=65348
Upvotes: 2
Reputation: 821
I belie you have found your answer by now. However, I'd like to share my script that works well by sending a json request to a server which returns the file in binary, then it downloads on the fly. Saving is not necessary. Hope it helps!
NOTE: You can avoid converting the post data to json.
<?php
// Username or E-mail
$login = 'username';
// Password
$password = 'password';
// API Request
$url = 'https://example.com/api';
// POST data
$data = array('someTask', 24);
// Convert POST data to json
$data_string = json_encode($data);
// initialize cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_USERPWD, "$login:$password");
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_string);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Execute cURL and store the response in a variable
$file = curl_exec($ch);
// Get the Header Size
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
// Get the Header from response
$header = substr($file, 0, $header_size);
// Get the Body from response
$body = substr($file, $header_size);
// Explode Header rows into an array
$header_items = explode("\n", $header);
// Close cURL handler
curl_close($ch);
// define new variable for the File name
$file_name = null;
// find the filname in the headers.
if(!preg_match('/filename="(.*?)"/', $header, $matches)){
// If filename not found do something...
echo "Unable to find filename.<br>Please check the Response Headers or Header parsing!";
exit();
} else {
// If filename was found assign the name to the variable above
$file_name = $matches[1];
}
// Check header response, if HTTP response is not 200, then display the error.
if(!preg_match('/200/', $header_items[0])){
echo '<pre>'.print_r($header_items[0], true).'</pre>';
exit();
} else {
// Check header response, if HTTP response is 200, then proceed further.
// Set the header for PHP to tell it, we would like to download a file
header('Content-Description: File Transfer');
header('Content-Type: application/octet-stream');
header('Content-Transfer-Encoding: binary');
header('Expires: 0');
header('Cache-Control: must-revalidate');
header('Pragma: public');
header('Content-Disposition: attachment; filename='.$file_name);
// Echo out the file, which then should trigger the download
echo $file;
exit;
}
?>
Upvotes: 6
Reputation: 6157
Below is a complete example that uses a class. The header parsing is more elaborate then it can be, cause I was laying the base for full hierarchial header storage.
I just noticed init() should reset a lot more variables if it wants to be possible to reuse the instance for more URL's, but this should at least give you a base of how to download a file to a filename sent by the server.
<?php
/*
* vim: ts=4 sw=4 fdm=marker noet tw=78
*/
class curlDownloader
{
private $remoteFileName = NULL;
private $ch = NULL;
private $headers = array();
private $response = NULL;
private $fp = NULL;
private $debug = FALSE;
private $fileSize = 0;
const DEFAULT_FNAME = 'remote.out';
public function __construct($url)
{
$this->init($url);
}
public function toggleDebug()
{
$this->debug = !$this->debug;
}
public function init($url)
{
if( !$url )
throw new InvalidArgumentException("Need a URL");
$this->ch = curl_init();
curl_setopt($this->ch, CURLOPT_URL, $url);
curl_setopt($this->ch, CURLOPT_HEADERFUNCTION,
array($this, 'headerCallback'));
curl_setopt($this->ch, CURLOPT_WRITEFUNCTION,
array($this, 'bodyCallback'));
}
public function headerCallback($ch, $string)
{
$len = strlen($string);
if( !strstr($string, ':') )
{
$this->response = trim($string);
return $len;
}
list($name, $value) = explode(':', $string, 2);
if( strcasecmp($name, 'Content-Disposition') == 0 )
{
$parts = explode(';', $value);
if( count($parts) > 1 )
{
foreach($parts AS $crumb)
{
if( strstr($crumb, '=') )
{
list($pname, $pval) = explode('=', $crumb);
$pname = trim($pname);
if( strcasecmp($pname, 'filename') == 0 )
{
// Using basename to prevent path injection
// in malicious headers.
$this->remoteFileName = basename(
$this->unquote(trim($pval)));
$this->fp = fopen($this->remoteFileName, 'wb');
}
}
}
}
}
$this->headers[$name] = trim($value);
return $len;
}
public function bodyCallback($ch, $string)
{
if( !$this->fp )
{
trigger_error("No remote filename received, trying default",
E_USER_WARNING);
$this->remoteFileName = self::DEFAULT_FNAME;
$this->fp = fopen($this->remoteFileName, 'wb');
if( !$this->fp )
throw new RuntimeException("Can't open default filename");
}
$len = fwrite($this->fp, $string);
$this->fileSize += $len;
return $len;
}
public function download()
{
$retval = curl_exec($this->ch);
if( $this->debug )
var_dump($this->headers);
fclose($this->fp);
curl_close($this->ch);
return $this->fileSize;
}
public function getFileName() { return $this->remoteFileName; }
private function unquote($string)
{
return str_replace(array("'", '"'), '', $string);
}
}
$dl = new curlDownloader(
'https://dl.example.org/torrent/cool-movie/4358-hash/download.torrent'
);
$size = $dl->download();
printf("Downloaded %u bytes to %s\n", $size, $dl->getFileName());
?>
Upvotes: 4
Reputation: 58002
To get both headers and data, separately, you typically use both a header callback and a body callback. Like in this example: http://curl.haxx.se/libcurl/php/examples/callbacks.html
To get the file name from the headers, you need to check for a Content-Disposition: header and extract the file name from there (if present) or just use the file name part from the URL or similar. Your choice.
Upvotes: 1
Reputation: 12508
Download file or web page using PHP cURL and save it to file
<?php
/**
* Initialize the cURL session
*/
$ch = curl_init();
/**
* Set the URL of the page or file to download.
*/
curl_setopt($ch, CURLOPT_URL,
'http://news.google.com/news?hl=en&topic=t&output=rss');
/**
* Create a new file
*/
$fp = fopen('rss.xml', 'w');
/**
* Ask cURL to write the contents to a file
*/
curl_setopt($ch, CURLOPT_FILE, $fp);
/**
* Execute the cURL session
*/
curl_exec ($ch);
/**
* Close cURL session and file
*/
curl_close ($ch);
fclose($fp);
?>
Upvotes: 7