Reputation: 71
I'm trying to speed up this this script for importing CSV file to MySQL. For 1000 rows it takes 130 seconds to load it. Tried to use it for 30 000 rows and after 20 minutes It timed out and loaded 8681 rows.
CSV header looks like (could be any number of columns in any order):
email;param1;..;paramX
[email protected];something;..;value
MySQL create mail_queue:
CREATE TABLE IF NOT EXISTS `mail_queue` (
`mail_queue_id` INT NOT NULL AUTO_INCREMENT,
`mailer_batch_id` INT NOT NULL,
`to` VARCHAR(100) NOT NULL,
`priority` INT NOT NULL DEFAULT 0,
`created` DATETIME NOT NULL DEFAULT NOW(),
`mail_status_id` INT NOT NULL,
PRIMARY KEY (`mail_queue_id`),
INDEX `fk_mail_queue_mailer_batch1_idx` (`mailer_batch_id` ASC),
INDEX `fk_mail_queue_mail_status1_idx` (`mail_status_id` ASC),
CONSTRAINT `fk_mail_queue_mailer_batch1`
FOREIGN KEY (`mailer_batch_id`)
REFERENCES `mailer_batch` (`mailer_batch_id`)
ON DELETE CASCADE
ON UPDATE NO ACTION,
CONSTRAINT `fk_mail_queue_mail_status1`
FOREIGN KEY (`mail_status_id`)
REFERENCES `mail_status` (`mail_status_id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
MySQL create mail_param:
CREATE TABLE IF NOT EXISTS `mail_param` (
`mail_param_id` INT NOT NULL AUTO_INCREMENT,
`mail_queue_id` INT NOT NULL,
`param_key` VARCHAR(45) NOT NULL,
`param_value` VARCHAR(45) NOT NULL,
PRIMARY KEY (`mail_param_id`),
INDEX `fk_mail_param_mail_queue1_idx` (`mail_queue_id` ASC),
CONSTRAINT `fk_mail_param_mail_queue1`
FOREIGN KEY (`mail_queue_id`)
REFERENCES `mail_queue` (`mail_queue_id`)
ON DELETE CASCADE
ON UPDATE NO ACTION)
ENGINE = InnoDB;
Code (Zend framework). Works well, but slow:
if (($handle = fopen($this->filepath, 'r')) !== false)
{
// DB
$mailQueueTable = new Application_Model_DbTable_MailQueue();
$mailParamTable = new Application_Model_DbTable_MailParam();
// Get header
$header = \ForceUTF8\Encoding::toUTF8(fgetcsv($handle, 0, ';'));
while(($data = fgetcsv($handle, 0, ';')) !== false)
{
// Save e-mail to e-mail queue
$mailQueueRow = $mailQueueTable->createRow();
$mailQueueRow->mailer_batch_id = $mailerBatchId;
$mailQueueRow->to = $data[$this->emailColumn];
$mailQueueRow->priority = 0;
$mailQueueRow->created = $created->toString('yyyy-MM-dd HH:mm:ss');
$mailQueueRow->mail_status_id = 1;
$mailQueueId = $mailQueueRow->save();
// Save e-mail params
foreach ($data as $key => $value) {
$mailParamRow = $mailParamTable->createRow();
$mailParamRow->mail_queue_id = $mailQueueId;
$mailParamRow->param_key = $header[$key];
$mailParamRow->param_value = \ForceUTF8\Encoding::toUTF8($value);
$mailParamRow->save();
}
unset($data);
}
fclose($handle);
}
I experimented with LOAD DATA INTO, but I can't use it because of mail_param table structure.
1) Create temporary table (OK)
$columns = "";
foreach ($this->header as $item) {
if ($columns == "") {
$columns = "`" . $item . "` VARCHAR(45)";
} else {
$columns .= ", `" . $item . "` VARCHAR(45)";
}
}
$query = 'CREATE TEMPORARY TABLE `tmp_csv_import` (
`id` int AUTO_INCREMENT,
' . $columns . '
) ENGINE MyISAM;';
2) LOAD DATA INFILE (OK)
$query = "LOAD DATA INFILE '" . $this->filepath . "'
INTO TABLE `tmp_csv_import`
FIELDS TERMINATED BY ';'
ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;";
3) Insert into mail_queue (OK)
// $mailerBatchId from last_insert_id()
$query = "INSERT INTO `mail_queue` (`mailer_batch_id`, `to`, `priority`, `created`, `mail_status_id`)
SELECT " . $mailerBatchId . ", `email`, 0, NOW(), 1 FROM `tmp_csv_import`";
4) Insert into mail_param (???)
I don't know what to write here. I need to insert new row for each column in table tmp_csv_import. And I need to get mail_queue_id - foreign key for table mail_param.
$query = "INSERT INTO mail_param (mail_queue_id, param_key, param_value)
SELECT ??? FROM `tmp_csv_import`";
Is it possible to do this in MySQL? Or should I do this different way?
Upvotes: 1
Views: 953
Reputation: 71
I found solution for my problem. There is no need to use a temporary table.
1) Data to "mail_queue" will be loaded using this code:
$query = "LOAD DATA INFILE '" . $this->filepath . "'
INTO TABLE `mail_queue`
FIELDS TERMINATED BY ';'
ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@dummy, @dummy, email, @dummy)
SET `mailer_batch_id` = " . (int) $mailerBatchId . ",
`priority` = 0,
`created` = NOW(),
`mail_status_id` = 1;";
For import only one row, will be generated by header
(@dummy, @dummy, email, @dummy)
Other values will be set by SET.
2) I will select "mail_queue_id" from inserted values and insert it into array this style:
array('to' => 'mail_queue_id');
3) I will create temporary CSV file from source file. Structure:
mail_queue_id;key;value
4) Data will be loaded into "mail_param" table:
$query = "LOAD DATA INFILE " . $tmpFilepath . "
INTO TABLE `mail_param`
FIELDS TERMINATED BY ';'
ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
(`mail_queue_id`, `param_key`, `param_value`);";
5)
unlink($tmpFilepath)
6) It's done. I tried to load CSV with 30 000 rows and it is significantly faster. (< 1s).
Upvotes: 1