André
André

Reputation: 25584

Bad performance function in PHP. With large files memory blows up! How can I refactor?

I have a function that strips out lines from files. I'm handling with large files(more than 100Mb). I have the PHP Memory with 256MB but the function that handles with the strip out of lines blows up with a 100MB CSV File.

What the function must do is this:

Originally I have the CSV like:

Copyright (c) 2007 MaxMind LLC. All Rights Reserved. locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode 1,"O1","","","",0.0000,0.0000,, 2,"AP","","","",35.0000,105.0000,, 3,"EU","","","",47.0000,8.0000,, 4,"AD","","","",42.5000,1.5000,, 5,"AE","","","",24.0000,54.0000,, 6,"AF","","","",33.0000,65.0000,, 7,"AG","","","",17.0500,-61.8000,, 8,"AI","","","",18.2500,-63.1667,, 9,"AL","","","",41.0000,20.0000,,

When I pass the CSV file to this function I got:

locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode 1,"O1","","","",0.0000,0.0000,, 2,"AP","","","",35.0000,105.0000,, 3,"EU","","","",47.0000,8.0000,, 4,"AD","","","",42.5000,1.5000,, 5,"AE","","","",24.0000,54.0000,, 6,"AF","","","",33.0000,65.0000,, 7,"AG","","","",17.0500,-61.8000,, 8,"AI","","","",18.2500,-63.1667,, 9,"AL","","","",41.0000,20.0000,,

It only strips out the first line, nothing more. The problem is the performance of this function with large files, it blows up the memory.

The function is:

 public function deleteLine($line_no, $csvFileName) {

  // this function strips a specific line from a file
  // if a line is stripped, functions returns True else false
  //
  // e.g.
  // deleteLine(-1, xyz.csv); // strip last line
  // deleteLine(1, xyz.csv); // strip first line

  // Assigna o nome do ficheiro
  $filename = $csvFileName;

  $strip_return=FALSE;

  $data=file($filename);
  $pipe=fopen($filename,'w');
  $size=count($data);

  if($line_no==-1) $skip=$size-1;
  else $skip=$line_no-1;

  for($line=0;$line<$size;$line++)
   if($line!=$skip)
    fputs($pipe,$data[$line]);
   else
    $strip_return=TRUE;

  return $strip_return;
 }

It is possible to refactor this function to not blow up with the 256MB PHP Memory?

Give me some clues.

Best Regards,

Upvotes: 1

Views: 415

Answers (3)

dnagirl
dnagirl

Reputation: 20446

well, the easiest answer is don't do it with PHP. Seriously, sed would work much better for this because the whole file would never be in memory. Check out these oneliners, but essentially:

sed '1d' filename

I know system calls are frowned upon, but I think this may be a case when one is warranted.

Upvotes: 1

codaddict
codaddict

Reputation: 455350

The problem for your blowout is the file function that brings the entire file in memory. To overcome this you need to read the file line by line, write all but the line to be deleted to a temporary file and finally rename the temporary file.

public function deleteLine($line_no, $csvFileName) {

        // get a temp file name in current working directory..you can use
        // any other directory say /tmp
        $tmpFileName = tempnam(".", "csv");

        $strip_return=FALSE;

        // open input file for reading.
        $readFD=fopen($csvFileName,'r');

        // temp file for writing.
        $writeFD=fopen($tmpFileName,'w');

        // check for fopen errors.

        if($line_no==-1) {
                $skip=$size-1;
        } else {
                $skip=$line_no-1;
        }

        $line = 0;

        // read lines from input file one by one.
        // write all lines except the line to be deleted.
        while (($buffer = fgets($readFD)) !== false) {
                if($line!=$skip)
                        fputs($writeFD,$buffer);
                else
                        $strip_return=TRUE;
                $line++;
        }

        // rename temp file to input file.    
        rename($tmpFileName,$csvFileName);

        return $strip_return;
}

Upvotes: 2

Jody
Jody

Reputation: 1743

The file() method reads an entire file into an array, all at once. I would imagine this is where things blow up. You probably want to have a second fopen() handle for your input file so you can read one line at a time.

If your requirement is to handle this task with PHP, that's fine. But this type of thing is probably better left to something like awk

Upvotes: 0

Related Questions