Gordon Cowtan
Gordon Cowtan

Reputation: 77

Importing a large dataset into MySQL using Yii2

I'm developing a dashboard using Yii2. The data is held in a remote SQL Server database and I'm copying it into a local MySQL database. The data is constantly updated with the addition of new rows. I need to update the local version at least once a day. There are currently about 1.5 million lines in the relevant table through each row through each row doesn't contain much data.

In the short term, I have no control over the structure of the remote table. It has no primary key set and the rows aren't time-stamped either so (I think) I have no way of knowing at the time of each update which rows are new and which ones already existed.

My approach is every 24 hours to take a complete copy of the remote table. The problem is that PHP runs out of memory when I do this. I found the Yii2 yii\db\Query->batch() function (http://www.yiiframework.com/doc-2.0/yii-db-query.html#batch()-detail) which looked like it should do the job but I'm not sure how to use it and when I've tried so far, it runs out of memory as well.

What I've got so far looks like this -

foreach ($query->batch(1000) as $rows) {
    do some stuff
}

So I suppose I have two questions -

  1. Is the batch() function the right thing to use.

  2. How should I be using it if the above doesn't work?

Upvotes: 1

Views: 1014

Answers (1)

Muhammad Omer Aslam
Muhammad Omer Aslam

Reputation: 23738

You need to use

Yii::$app->db->pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY,false)

before the query and then start the batch you, can use setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false) for batch() and after that return back attribute to true for other cases.

According to the DOCS

When working with large amounts of data, methods such as [[yii\db\Query::all()]] are not suitable because they require loading the whole query result into the client's memory. To solve this issue Yii provides batch query support. The server holds the query result, and the client uses a cursor to iterate over the result set one batch at a time.

Major Reasons as from GITHUB

As a note: this will happen all the time on the default install of PHP7 now due to how mysqlnd now works by adding the result set's memory usage to the process's own: http://php.net/manual/en/mysqlinfo.concepts.buffering.php

When using libmysqlclient as library PHP's memory limit won't count the memory used for result sets unless the data is fetched into PHP variables. With mysqlnd the memory accounted for will include the full result set.

And mysqlnd is the recommended lib for the PDO extension now, while libmysqlclient is now "not recommended"

A better alternative than the workaround provided above is to use unbuffered connection to the databases as in the function provided by IHIPOP@Github, see the full ISSUE here for details

Upvotes: 1

Related Questions