Reputation: 27082
I have a CSV file with more than 100.000 lines, each line has 3 values separated by semicolon. Total filesize is approx. 5MB.
CSV file is in this format:
stock_id;product_id;amount
==========================
1;1234;0
1;1235;1
1;1236;0
...
2;1234;3
2;1235;2
2;1236;13
...
3;1234;0
3;1235;2
3;1236;0
...
We have 10 stocks which are indexed 1-10 in CSV. In database we have them saved as 22-31.
CSV is sorted by stock_id, product_id but I think it doesn't matter.
What I have
<?php
session_start();
require_once ('db.php');
echo '<meta charset="iso-8859-2">';
// convert table: `CSV stock id => DB stock id`
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
$sql = $mysqli->query("SELECT product_id FROM table WHERE fielddef_id = 1");
while ($row = $sql->fetch_assoc()) {
$products[$row['product_id']] = 1;
}
$csv = file('export.csv');
// go thru CSV file and prepare SQL UPDATE query
foreach ($csv as $row) {
$data = explode(';', $row);
// $data[0] - stock_id
// $data[1] - product_id
// $data[2] - amount
if (isset($products[$data[1]])) {
// in CSV are products which aren't in database
// there is echo which should show me queries
echo " UPDATE t
SET value = " . (int)$data[2] . "
WHERE fielddef_id = " . (int)$stocks[$data[0]] . " AND
product_id = '" . $data[1] . "' -- product_id isn't just numeric
LIMIT 1<br>";
}
}
Problem is that writing down 100k lines by echo
is soooo slow, takes long minutes. I'm not sure what MySQL will do, if it will be faster, or take ± the same time. I have no testing machine here, so I'm worry about testing in on prod server.
My idea was to load CSV file into more variables (better array) like below, but I don't know why.
$csv[0] = lines 0 - 10.000;
$csv[1] = lines 10.001 - 20.000;
$csv[2] = lines 20.001 - 30.000;
$csv[3] = lines 30.001 - 40.000;
etc.
I found eg. Efficiently counting the number of lines of a text file. (200mb+), but I'm not sure how it can help me.
When I replace foreach
for print_r
, I get dump in < 1 sec. The task is to make the foreach loop with database update faster.
Any ideas how to updates so many records in database?
Thanks.
Upvotes: 3
Views: 4396
Reputation: 37045
Like I said in the comment, use SPLFileObject to iterate over the CSV file. Use Prepared statements to reduce performance overhead of calling the UPDATE in each loop. Also, merge your two queries together, there isn't any reason to pull all of the product rows first and check them against the CSV. You can use a JOIN to ensure that only those stocks in the second table that are related to the product in the first and that is the current CSV row will get updated:
/* First the CSV is pulled in */
$export_csv = new SplFileObject('export.csv');
$export_csv->setFlags(SplFileObject::READ_CSV | SplFileObject::DROP_NEW_LINE | SplFileObject::READ_AHEAD);
$export_csv->setCsvControl(';');
/* Next you prepare your statement object */
$stmt = $mysqli->prepare("
UPDATE stocks, products
SET value = ?
WHERE
stocks.fielddef_id = ? AND
product_id = ? AND
products.fielddef_id = 1
LIMIT 1
");
$stmt->bind_param('iis', $amount, $fielddef_id, $product_id);
/* Now you can loop through the CSV and set the fields to match the integers bound to the prepared statement and execute the update on each loop. */
foreach ($export_csv as $csv_row) {
list($stock_id, $product_id, $amount) = $csv_row;
$fielddef_id = $stock_id + 21;
if(!empty($stock_id)) {
$stmt->execute();
}
}
$stmt->close();
Upvotes: 2
Reputation: 27082
Due to answers and comments for the question, I have the solution. The base for that is from @Dave, I've only updated it to pass better to question.
<?php
require_once 'include.php';
// stock convert table (key is ID in CSV, value ID in database)
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
// product IDs in CSV (value) and Database (product_id) are different. We need to take both IDs from database and create an array of e-shop products
$sql = mysql_query("SELECT product_id, value FROM cms_module_products_fieldvals WHERE fielddef_id = 1") or die(mysql_error());
while ($row = mysql_fetch_assoc($sql)) {
$products[$row['value']] = $row['product_id'];
}
$handle = fopen('import.csv', 'r');
$i = 1;
while (($data = fgetcsv($handle, 1000, ';')) !== FALSE) {
$p_id = (int)$products[$data[1]];
if ($p_id > 0) {
// if product exists in database, continue. Without this condition it works but we do many invalid queries to database (... WHERE product_id = 0 updates nothing, but take a time)
if ($i % 300 === 0) {
// optional, we'll see what it do with the real traffic
sleep(1);
}
$updatesql = "UPDATE table SET value = " . (int)$data[2] . " WHERE fielddef_id = " . $stocks[$data[0]] . " AND product_id = " . (int)$p_id . " LIMIT 1";
echo "$updatesql<br>";//for debug only comment out on live
$i++;
}
}
// cca 1.5sec to import 100.000k+ records
fclose($handle);
Upvotes: 3
Reputation: 10269
Updating every record every time will be too expensive (mostly due to seeks, but also from writing).
You should TRUNCATE
the table first and then insert all the records again (assuming you won't have external foreign keys linking to this table).
To make it even faster, you should lock the table before the insert and unlock it afterwards. This will prevent the indexing from happening at every insert.
Upvotes: 0
Reputation: 3288
Something like this (please note this is 100% untested and off top of my head may need some tweaking to actually work :) )
//define array may (probably better ways of doing this
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
$handle = fopen("file.csv", "r")); //open file
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
//loop through csv
$updatesql = "UPDATE t SET `value` = ".$data[2]." WHERE fielddef_id = ".$stocks[$data[0]]." AND product_id = ".$data[1];
echo "$updatesql<br>";//for debug only comment out on live
}
There is no need to do your initial select since you're only ever setting your product data to 1 anyway in your code and it looks from your description that your product id's are always correct its just your fielddef column which has the map.
Also just for live don't forget to put your actual mysqli execute command in on your $updatesql;
To give you a comparison to actual usage code (I can benchmark against!) This is some code I use for an importer of an uploaded file (its not perfect but it does its job)
if (isset($_POST['action']) && $_POST['action']=="beginimport") {
echo "<h4>Starting Import</h4><br />";
// Ignore user abort and expand time limit
//ignore_user_abort(true);
set_time_limit(60);
if (($handle = fopen($_FILES['clientimport']['tmp_name'], "r")) !== FALSE) {
$row = 0;
//defaults
$sitetype = 3;
$sitestatus = 1;
$startdate = "2013-01-01 00:00:00";
$enddate = "2013-12-31 23:59:59";
$createdby = 1;
//loop and insert
while (($data = fgetcsv($handle, 10000, ",")) !== FALSE) { // loop through each line of CSV. Returns array of that line each time so we can hard reference it if we want.
if ($row>0) {
if (strlen($data[1])>0) {
$clientshortcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0])));
$sitename = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0]))." ".trim(stripslashes($data[1])));
$address = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[1])).",".trim(stripslashes($data[2])).",".trim(stripslashes($data[3])));
$postcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[4])));
//look up client ID
$client = $db->queryUniqueObject("SELECT ID FROM tblclients WHERE ShortCode='$clientshortcode'",ENABLE_DEBUG);
if ($client->ID>0 && is_numeric($client->ID)) {
//got client ID so now check if site already exists we can trust the site name here since we only care about double matching against already imported sites.
$sitecount = $db->countOf("tblsites","SiteName='$sitename'");
if ($sitecount>0) {
//site exists
echo "<strong style=\"color:orange;\">SITE $sitename ALREADY EXISTS SKIPPING</strong><br />";
} else {
//site doesn't exist so do import
$db->execute("INSERT INTO tblsites (SiteName,SiteAddress,SitePostcode,SiteType,SiteStatus,CreatedBy,StartDate,EndDate,CompanyID) VALUES
('$sitename','$address','$postcode',$sitetype,$sitestatus,$createdby,'$startdate','$enddate',".$client->ID.")",ENABLE_DEBUG);
echo "IMPORTED - ".$data[0]." - ".$data[1]."<br />";
}
} else {
echo "<strong style=\"color:red;\">CLIENT $clientshortcode NOT FOUND PLEASE ENTER AND RE-IMPORT</strong><br />";
}
fcflush();
set_time_limit(60); // reset timer on loop
}
} else {
$row++;
}
}
echo "<br />COMPLETED<br />";
}
fclose($handle);
unlink($_FILES['clientimport']['tmp_name']);
echo "All Imports finished do not reload this page";
}
That imported 150k rows in about 10 seconds
Upvotes: 3
Reputation: 5237
Make the query bigger, i.e. use the loop to compile a larger query. You may need to split it up into chunks (e.g. process 100 at a time), but certainly don't do one query at a time (applies for any kind, insert, update, even select if possible). This should greatly increase the performance.
It's generally recommended that you don't query in a loop.
Upvotes: 0