dev02
dev02

Reputation: 1856

PHP Getting out of memory

I am trying to insert data from postgres database into mysql database. There are about 100000 records that I need to import. However Iam always getting out of memory issue.

Out of memory (allocated 1705508864) (tried to allocate 222764 bytes)

I am using Laravel 5 to do this, here is code:

// to avoid memory limit or time out issue
ini_set('memory_limit', '-1');
ini_set('max_input_time', '-1');
ini_set('max_execution_time', '0');
set_time_limit(0);

// this speeds up things a bit
DB::disableQueryLog();

$importableModels = [
    // array of table names
];

$failedChunks = 0;

foreach ($importableModels as $postGresModel => $mysqlModel) {

    $total = $postGresModel::count();
    $chunkSize = getChunkSize($total);

    // customize chunk size in case of certain tables to avoid too many place holders error
    if ($postGresModel === 'ApplicationFormsPostgres') {
        $chunkSize = 300;
    }

    $class = 'App\\Models\\' . $mysqlModel;
    $object = new $class;

    // trucate prev data //
    Eloquent::unguard();
    DB::statement('SET FOREIGN_KEY_CHECKS=0;');
    $object->truncate();
    DB::statement('SET FOREIGN_KEY_CHECKS=1;');
    Eloquent::reguard();

    $postGresModel::chunk($chunkSize, function ($chunk) use ($postGresModel, $mysqlModel, $failedChunks, $object) {

        // make any adjustments
        $fixedChunk = $chunk->map(function ($item, $key) use ($postGresModel) {

            $appendableAttributes = $postGresModel::APPEND_FIELDS;
            $attributes = $item->getAttributes();

            // replace null/no values with empty string
            foreach ($attributes as $key => $attribute) {
                if ($attribute === null) {
                    $attributes[$key] = '';
                }
            }

            // add customized attributes and values
            foreach ($appendableAttributes as $appendField) {
                if ($appendField === 'ssn') {
                    $value = $attributes['number'];
                    $attributes[$appendField] = substr($value, 0, 4);
                } else {
                    $attributes[$appendField] = '';
                }

            }

            return $attributes;
        });

        // insert chunk of data in db now
        if (!$object->insert($fixedChunk->toArray())) {
            $failedChunks++;
        }

    });    
}

Memory issue comes when about 80000 rows are inserted not before that.

I suspect something is wrong with collection map function or loops inside the map function. I have even tried setting memory setting and time limit settings to unlimited but to no avail. May be I need to use reference variables or something but I am not sure how.

Can any optimizations be made in above code to reduce memory usage?

Or how do I efficiently import large data from large PostgreSQL database to MySQL through code ?

Can anyone tell what I am doing wrong here or why whole memory gets consumed up ?

PS: I am doing this on local development machine which has 4GB ram (Windows 8). PHP version: 5.6.16

Upvotes: 9

Views: 3964

Answers (6)

F.Igor
F.Igor

Reputation: 4350

1.- Try commenting the content of data processing logic to check if the memory leak is within this code:

$postGresModel::chunk($chunkSize, function ($chunk) use ($postGresModel, $mysqlModel, $failedChunks, $object) {

        // make any adjustments
        $fixedChunk = $chunk->map(function ($item, $key) use ($postGresModel) {

              ///Nothing to do 
       }
}

2.- If you'll get the same error, memory leak could be produced by the mysql driver (PDO?) when trying to dump all rows from query result, buffering all rows in memory.

As in PostgreSQL unbuffered queries and PHP (cursors) you could change the behavior of hoy postgreSql fetch rows using a cursor:

$curSql = "DECLARE cursor1 CURSOR FOR SELECT * FROM big_table";
$con = new PDO("pgsql:host=dbhost dbname=database", "user", "pass");
$con->beginTransaction(); // cursors require a transaction.
$stmt = $con->prepare($curSql);
$stmt->execute();

$innerStatement = $con->prepare("FETCH 1 FROM cursor1");

while($innerStatement->execute() && $row = $innerStatement->fetch(PDO::FETCH_ASSOC)) {
    echo $row['field'];
}

Upvotes: 1

Alfie
Alfie

Reputation: 2350

When you grab your PostgreSQL data, try to LIMIT the size of what gets returned (http://www.postgresql.org/docs/8.1/static/queries-limit.html) to something reasonable and then iterate.

Say for instance you took 20000 rows at a time, you would do 'SELECT .. BLAH .. LIMIT 20000 OFFSET 0', then next iteration would be 'SELECT .. BLAH .. LIMIT 20000 OFFSET 20000', (the OFFSET being 20000 * your loop counter).

Process these batches until you have no rows left.

Upvotes: 1

Ruslan Osmanov
Ruslan Osmanov

Reputation: 21492

Definitely, you've got a memory leak somewhere. I guess somewhere within $chunk->map(), or $object->insert($fixedChunk->toArray()). We can only guess, because the implementation is hidden.

However, I would use generators as much as possible. The code might look something like the following:

function getAllItems() {
  $step = 2000;

  for ($offset = 0 ;; $offset += $step) {
    $q = "SELECT * FROM items_table LIMIT $offset, $step";

    if (! $items = Db::fetchAll($q)) {
      break;
    }

    foreach ($items as $i) {
      yield $i;
    }
  }
}

foreach (getAllItems() as $item) {
  import_item($item);
}

I dare to say that with generators you'll be able to import practically any amount of data from one database to another.

Upvotes: 2

Solarflare
Solarflare

Reputation: 11106

map will return a new instance of your collection. GC will clean that up too late.

Try replacing

$chunk = $chunk->map(function... 

with

$newchunk = $chunk->map(function... 

and of course use the new chunk when inserting, $object->insert($newchunk->toArray()). You can use transform instead of map too.

GC should collect it now, but you can add an unset($newchunk); after the insert to make sure. An unset($object); in the second to last line of your code wouldn't hurt either.

Upvotes: 0

Francesco Abeni
Francesco Abeni

Reputation: 4265

A few suggestions.

  • You instantiate a new $object object at each loop. Depending on the actual structure of the MySqlModel and the number of items it can definitely use a lot of memory (also because of GC not yet working, see second suggestion). Set it to NULL at the end of each loop, i.e.

$object = NULL;

  • If execution time is not an issue, insert a little delay between each loop. This allows PHP garbage collector to do some work and free up unused resources.

Upvotes: 0

Rick James
Rick James

Reputation: 142218

Yes, you could change the 'memory_limit'. But that only works today, not tomorrow, when you will need even more memory.

Plan A:

Instead, write a little more code... Chunk up the data into, say, 1000 rows at a time. Build a single INSERT statement with all the rows in it. Execute it in a transaction by itself.

Plan B:

Build a CSV file of all the rows, then use LOAD DATA INFILE to do the mass insert.

In either Plan, avoid loading all the rows into RAM at once. There is a lot of overhead for scalars and arrays in PHP.

Upvotes: 4

Related Questions