Reputation: 58004
I am downloading a CSV file from another server as a data feed from a vendor.
I am using curl to get the contents of the file and saving that into a variable called $contents
.
I can get to that part just fine, but I tried exploding by \r
and \n
to get an array of lines but it fails with an 'out of memory' error.
I echo strlen($contents)
and it's about 30.5 million chars.
I need to manipulate the values and insert them into a database. What do I need to do to avoid memory allocation errors?
Upvotes: 11
Views: 27868
Reputation: 2672
Darren Cook comment to Pascal MARTIN response is really interesting. In modern PHP+Curl versions, the CURLOPT_WRITEFUNCTION
option can be set so CURL invokes a callback for each received "chunk" of data. Specifically, the "callable" will received two parameters, the first one with the invoking curl object, and the second one with the data chunk. The funcion should return strlen($data)
in order for curl to continue sending more data.
Callables can be methods in PHP. Using all this, I've developed a possible solution that I find more readable that the previous one (although Pascal Martin response is really great, things have changed since then). I've used public attributes for simplicity, but I'm sure readers could adapt and improve the code. You can even abort the CURL request when a number of lines (or bytes) have been reached. I hope this would be useful for others.
<?
class SplitCurlByLines {
public function curlCallback($curl, $data) {
$this->currentLine .= $data;
$lines = explode("\n", $this->currentLine);
// The last line could be unfinished. We should not
// proccess it yet.
$numLines = count($lines) - 1;
$this->currentLine = $lines[$numLines]; // Save for the next callback.
for ($i = 0; $i < $numLines; ++$i) {
$this->processLine($lines[$i]); // Do whatever you want
++$this->totalLineCount; // Statistics.
$this->totalLength += strlen($lines[$i]) + 1;
}
return strlen($data); // Ask curl for more data (!= value will stop).
}
public function processLine($str) {
// Do what ever you want (split CSV, ...).
echo $str . "\n";
}
public $currentLine = '';
public $totalLineCount = 0;
public $totalLength = 0;
} // SplitCurlByLines
// Just for testing, I will echo the content of Stackoverflow
// main page. To avoid artifacts, I will inform the browser about
// plain text MIME type, so the source code should be vissible.
Header('Content-type: text/plain');
$splitter = new SplitCurlByLines();
// Configuration of curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://stackoverflow.com/");
curl_setopt($ch, CURLOPT_WRITEFUNCTION, array($splitter, 'curlCallback'));
curl_exec($ch);
// Process the last line.
$splitter->processLine($splitter->currentLine);
curl_close($ch);
error_log($splitter->totalLineCount . " lines; " .
$splitter->totalLength . " bytes.");
?>
Upvotes: 14
Reputation: 401162
As other answers said :
CURLOPT_FILE
But, you might not want to really create a file you could want to work with data in memory... Using it as soon as it "arrives".
One possible solution might be defining your own stream wrapper, and use this one, instead of a real file, with CURLOPT_FILE
First of all, see :
And now, let's go with an example.
First, let's create our stream wrapper class :
class MyStream {
protected $buffer;
function stream_open($path, $mode, $options, &$opened_path) {
// Has to be declared, it seems...
return true;
}
public function stream_write($data) {
// Extract the lines ; on y tests, data was 8192 bytes long ; never more
$lines = explode("\n", $data);
// The buffer contains the end of the last line from previous time
// => Is goes at the beginning of the first line we are getting this time
$lines[0] = $this->buffer . $lines[0];
// And the last line os only partial
// => save it for next time, and remove it from the list this time
$nb_lines = count($lines);
$this->buffer = $lines[$nb_lines-1];
unset($lines[$nb_lines-1]);
// Here, do your work with the lines you have in the buffer
var_dump($lines);
echo '<hr />';
return strlen($data);
}
}
What I do is :
stream_write
Next, we register this stream wrapper, to be used with the pseudo-protocol "test" :
// Register the wrapper
stream_wrapper_register("test", "MyStream")
or die("Failed to register protocol");
And, now, we do our curl request, like we would do when writting to a "real" file, like other answers suggested :
// Open the "file"
$fp = fopen("test://MyTestVariableInMemory", "r+");
// Configuration of curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.rue89.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_BUFFERSIZE, 256);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FILE, $fp); // Data will be sent to our stream ;-)
curl_exec($ch);
curl_close($ch);
// Don't forget to close the "file" / stream
fclose($fp);
Note we don't work with a real file, but with our pseudo-protocol.
This way, each time a chunk of data arrives, MyStream::stream_write
method will get called, and will be able to work on a small amount of data (when I tested, I always got 8192 bytes, whatever value I used for CURLOPT_BUFFERSIZE
)
A few notes :
Still, I hope this helps ;-)
Have fun !
Upvotes: 52
Reputation: 50928
You might want to consider saving it to a temporary file, and then reading it one line at a time using fgets
or fgetcsv
.
This way you avoid the initial big array you get from exploding such a large string.
Upvotes: 5
Reputation: 146
NB:
"Basically, if you open a file with fopen, fclose it and then unlink it, it works fine. But if between fopen and fclose, you give the file handle to cURL to do some writing into the file, then the unlink fails. Why this is happening is beyond me. I think it may be related to Bug #48676"
http://bugs.php.net/bug.php?id=49517
So be careful if you're on an older version of PHP. There is a simple fix on this page to double-close the file resource:
fclose($fp);
if (is_resource($fp))
fclose($fp);
Upvotes: 0
Reputation: 166126
PHP is choking because it's running out memory. Instead of having curl populate a PHP variable with the contents of the file, use the
CURLOPT_FILE
option to save the file to disk instead.
//pseudo, untested code to give you the idea
$fp = fopen('path/to/save/file', 'w');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec ($ch);
curl_close ($ch);
fclose($fp);
Then, once the file is saved, instead of using the file
or file_get_contents
functions (which would load the entire file into memory, killing PHP again), use fopen
and fgets to read the file one line at a time.
Upvotes: 18
Reputation: 12510
memory_limit
in php.ini
.fopen()
and fgets()
.Upvotes: 3
Reputation: 60997
Spool it to a file. Don't try to hold all that data in memory at once.
Upvotes: 2