Reputation: 1856
I am trying to insert data from postgres database into mysql database. There are about 100000
records that I need to import. However Iam always getting out of memory issue.
Out of memory (allocated 1705508864) (tried to allocate 222764 bytes)
I am using Laravel 5 to do this, here is code:
// to avoid memory limit or time out issue
ini_set('memory_limit', '-1');
ini_set('max_input_time', '-1');
ini_set('max_execution_time', '0');
set_time_limit(0);
// this speeds up things a bit
DB::disableQueryLog();
$importableModels = [
// array of table names
];
$failedChunks = 0;
foreach ($importableModels as $postGresModel => $mysqlModel) {
$total = $postGresModel::count();
$chunkSize = getChunkSize($total);
// customize chunk size in case of certain tables to avoid too many place holders error
if ($postGresModel === 'ApplicationFormsPostgres') {
$chunkSize = 300;
}
$class = 'App\\Models\\' . $mysqlModel;
$object = new $class;
// trucate prev data //
Eloquent::unguard();
DB::statement('SET FOREIGN_KEY_CHECKS=0;');
$object->truncate();
DB::statement('SET FOREIGN_KEY_CHECKS=1;');
Eloquent::reguard();
$postGresModel::chunk($chunkSize, function ($chunk) use ($postGresModel, $mysqlModel, $failedChunks, $object) {
// make any adjustments
$fixedChunk = $chunk->map(function ($item, $key) use ($postGresModel) {
$appendableAttributes = $postGresModel::APPEND_FIELDS;
$attributes = $item->getAttributes();
// replace null/no values with empty string
foreach ($attributes as $key => $attribute) {
if ($attribute === null) {
$attributes[$key] = '';
}
}
// add customized attributes and values
foreach ($appendableAttributes as $appendField) {
if ($appendField === 'ssn') {
$value = $attributes['number'];
$attributes[$appendField] = substr($value, 0, 4);
} else {
$attributes[$appendField] = '';
}
}
return $attributes;
});
// insert chunk of data in db now
if (!$object->insert($fixedChunk->toArray())) {
$failedChunks++;
}
});
}
Memory issue comes when about 80000
rows are inserted not before that.
I suspect something is wrong with collection map
function or loops inside the map function. I have even tried setting memory setting and time limit settings to unlimited but to no avail. May be I need to use reference variables or something but I am not sure how.
Can any optimizations be made in above code to reduce memory usage?
Or how do I efficiently import large data from large PostgreSQL database to MySQL through code ?
Can anyone tell what I am doing wrong here or why whole memory gets consumed up ?
PS: I am doing this on local development machine which has 4GB ram (Windows 8). PHP version: 5.6.16
Upvotes: 9
Views: 3964
Reputation: 4350
1.- Try commenting the content of data processing logic to check if the memory leak is within this code:
$postGresModel::chunk($chunkSize, function ($chunk) use ($postGresModel, $mysqlModel, $failedChunks, $object) {
// make any adjustments
$fixedChunk = $chunk->map(function ($item, $key) use ($postGresModel) {
///Nothing to do
}
}
2.- If you'll get the same error, memory leak could be produced by the mysql driver (PDO?) when trying to dump all rows from query result, buffering all rows in memory.
As in PostgreSQL unbuffered queries and PHP (cursors) you could change the behavior of hoy postgreSql fetch rows using a cursor:
$curSql = "DECLARE cursor1 CURSOR FOR SELECT * FROM big_table";
$con = new PDO("pgsql:host=dbhost dbname=database", "user", "pass");
$con->beginTransaction(); // cursors require a transaction.
$stmt = $con->prepare($curSql);
$stmt->execute();
$innerStatement = $con->prepare("FETCH 1 FROM cursor1");
while($innerStatement->execute() && $row = $innerStatement->fetch(PDO::FETCH_ASSOC)) {
echo $row['field'];
}
Upvotes: 1
Reputation: 2350
When you grab your PostgreSQL data, try to LIMIT
the size of what gets returned (http://www.postgresql.org/docs/8.1/static/queries-limit.html) to something reasonable and then iterate.
Say for instance you took 20000 rows at a time, you would do 'SELECT .. BLAH .. LIMIT 20000 OFFSET 0', then next iteration would be 'SELECT .. BLAH .. LIMIT 20000 OFFSET 20000', (the OFFSET being 20000 * your loop counter).
Process these batches until you have no rows left.
Upvotes: 1
Reputation: 21492
Definitely, you've got a memory leak somewhere. I guess somewhere within $chunk->map()
, or $object->insert($fixedChunk->toArray())
. We can only guess, because the implementation is hidden.
However, I would use generators as much as possible. The code might look something like the following:
function getAllItems() {
$step = 2000;
for ($offset = 0 ;; $offset += $step) {
$q = "SELECT * FROM items_table LIMIT $offset, $step";
if (! $items = Db::fetchAll($q)) {
break;
}
foreach ($items as $i) {
yield $i;
}
}
}
foreach (getAllItems() as $item) {
import_item($item);
}
I dare to say that with generators you'll be able to import practically any amount of data from one database to another.
Upvotes: 2
Reputation: 11106
map
will return a new instance of your collection. GC will clean that up too late.
Try replacing
$chunk = $chunk->map(function...
with
$newchunk = $chunk->map(function...
and of course use the new chunk when inserting, $object->insert($newchunk->toArray())
. You can use transform
instead of map
too.
GC should collect it now, but you can add an unset($newchunk);
after the insert to make sure. An unset($object);
in the second to last line of your code wouldn't hurt either.
Upvotes: 0
Reputation: 4265
A few suggestions.
$object
object at each loop. Depending on the actual structure of the MySqlModel
and the number of items it can definitely use a lot of memory (also because of GC not yet working, see second suggestion). Set it to NULL at the end of each loop, i.e. $object = NULL;
Upvotes: 0
Reputation: 142218
Yes, you could change the 'memory_limit'. But that only works today, not tomorrow, when you will need even more memory.
Plan A:
Instead, write a little more code... Chunk up the data into, say, 1000 rows at a time. Build a single INSERT
statement with all the rows in it. Execute it in a transaction by itself.
Plan B:
Build a CSV file of all the rows, then use LOAD DATA INFILE
to do the mass insert.
In either Plan, avoid loading all the rows into RAM at once. There is a lot of overhead for scalars and arrays in PHP.
Upvotes: 4