garethdn
garethdn

Reputation: 12373

Downloading a very large XML file with PHP

I currently have a script written that begins downloading an large (1.3GB) XML file from the web but I have encountered a number of problems. This is my code:

   function readfile_chunked ($filename) { 
      $chunksize = 1*(1024*1024); 
      $buffer = ''; 
      $handle = fopen($filename, 'rb'); 
      if ($handle === false) { 
        return false; 
      } 
      while (!feof($handle)) { 
        $buffer = fread($handle, $chunksize); 
        //print $buffer; 

        $myFile = "test.xml";
        $fh = fopen($myFile, 'a') or die("can't open file");
        fwrite($fh, $buffer);
        fclose($fh);
      } 
      return fclose($handle); 
    } 

The first (and main) problem is the following error while downloading saying:

Fatal error: Maximum execution time of 30 seconds exceeded in /Applications/MAMP/htdocs/test/test.php on line 53

As I understand it this is basically a timeout and i've read about changing timeout settings in php.ini but i'm conscious that when this application goes live i won't be able to edit the php.ini file on the shared server.

This problem brings me onto my next one - i want to implement some kind of error-checking and prevention. For example, if the connection to the server goes down i'd like to be able to resume when the connection is restored. I realise this may not be possible though. An alternative would be to compare filesizes of local and remote maybe?

I also need to add an Accept-Encoding: gzip HTTP header in my request.

And that would finally bring me onto some kind of progress notification that I would like, presumably constantly polling with JavaScript comparing local and remote filesizes perhaps?

The first two points, however, would be the most important as currently I can't download the file I require. Any help would be appreciated.

Upvotes: 0

Views: 1156

Answers (2)

Ian
Ian

Reputation: 900

I had a similar problem with php and inserted the following code to get around the execution time problem:

ignore_user_abort(true); set_time_limit(0); ini_set('memory_limit', '2048M');

Upvotes: 0

lobostome
lobostome

Reputation: 433

Regarding your question about the timeout. I would suggest to run that task as a cron job. When running PHP from the command line, the default setting of maximum execution time is 0 (no time limit). This way you will avoid the guess work on how long it will take to download the file, which is variable that depends on various factors. I believe the majority of shared hosts allow you to run cron jobs.

For download resuming and gzip, I would suggest using the PEAR package HTTP_Download

It supports HTTP compression, caching and partial downloads, resuming and sending raw data

Upvotes: 1

Related Questions