Reputation: 21
I need to upload a large CSV to MySQL, if uploading breaks I have to start from where I stopped, no duplicate entries. If process breaks then it should restart from where I ended automatically (i.e.: if uploading breaK after 123 entries then it should resume from 124 on the next run)
CSV file format:
latitude longitude
6.5486 72.456
4.2186 74.466
5.5486 82.956
I only need one entry with same latitude and longitude, currently I'm using the code below (working) but I don't know how to start from the breaking point if uploading breaks.
<?php
error_reporting(0);
require("connection.php");//connect to the database
if ($_FILES[csv][size] > 0){
//get the csv file
$file = $_FILES[csv][tmp_name];
echo $fname = $_FILES['csv']['name'];
echo $ftype = end(explode('.', strtolower($fname)));
if($ftype=="csv"){
$handle = fopen($file,"r");
//loop through the csv file and insert into database
do {
if ($data[0]) {
$latitude=$data[0];
$longitude=$data[1];
$location1=$data[2];
$location2=$data[3];
$location3=$data[4];
$sql = "SELECT * FROM latitude_longitude WHERE latitude ='$latitude' AND longitude='$longitude' ";
$result=mysql_query($sql);
if( mysql_num_rows($result) > 0){
mysql_query("UPDATE latitude_longitude SET latitude = '$latitude',longitude = '$longitude',location1='$location1', location2='$location2',location3='$location3',status=status+1 WHERE latitude = '$latitude' AND longitude = '$longitude'");
}
else{
mysql_query("INSERT INTO latitude_longitude (latitude, longitude, location1, location2, location3, status, date) VALUES
(
'".addslashes($data[0])."',
'".addslashes($data[1])."',
'".addslashes($data[2])."',
'".addslashes($data[3])."',
'".addslashes($data[4])."',
'1',
CURRENT_TIMESTAMP
)
");
}
}
} while ($data = fgetcsv($handle,1000,",","'"));
//redirect
header('Location:GeoLocation.php?success=1'); die;
}else{
header('Location:GeoLocation.php?success=2'); die;
}
}
?>
thanks for helping in advance.
Upvotes: 2
Views: 524
Reputation: 57408
Note: you're strongly advised to use PDO, as mysql_*
functions are being phased out. You will be able to get rid of all those addslashes()
too!
For the insert, you can proceed like this:
Keep in session some information about the CSV
$import = array(
'current' => 0,
'total' => 0, // Estimated
'begun' => 0,
'fpos' => 0,
'flen' => 0, // Total CSV file size
'errors' => 0,
);
and create a UNIQUE INDEX covering the fields for which you want no duplicates (CREATE UNIQUE INDEX ...
)
When you import from the CSV:
- open $csv file
- fseek() the file to the $session['fpos'] offset
- MySQL set AUTOCOMMIT to off;
- MySQL BEGIN WORK;
- Get current time plus 10 seconds into $ttl
- loop
- read one record using, say, fgetcsv()
- try
- insert into the DB using INSERT IGNORE
- catch PDO error
- MySQL ROLLBACK, $session['errors']++ and immediately die().
- is time() equal or above $ttl? If so, break
- update the session object, set its 'errors' to 0, put fpos() of $csv file into it
- MySQL COMMIT;
- Your ETA is $session['begun'] + (time()-$session['begun'])*($session['fpos']/$session['flen']).
With the above method, it will take somewhat more than 10 seconds for each iteration.
If the inserts go well, the new session will contain the next file offset from whence to start.
If some error ensues, the whole transaction block will rollback, and it will be as if it never started. You keep track of consecutive errors (they might mean something's wrong in the CSV).
You can do this and return the session object as JSON. Then you can import the CSV calling the above script through, say, jQuery $.get()
, and use it to update a progress bar. You will be able to calculate the expected time of arrival for the whole procedure:
importing [############## ] 48%, 32m 15s left
This means that the "UPLOAD" page must move the CSV file in a temporary directory and immediately display the HTML that will show the progress bar. Then this HTML (and Javascript) will be responsible for the upload to go on.
There are also upload libraries such as PLupload that upload the file in "chunks" and display a progress bar of their own. Since usually network upload is slower than MySQL upload, it is very convenient to upload the chunks one at a time. The last CSV record in each chunk will likely be truncated, so some kind of check is required, and the "fragment" of that chunk must be saved and attached to the next chunk to reconstruct the record.
The plupload method has the advantage of making MySQL import apparently take zero time - as soon as the upload is finished, even with very large file, the data is available. You put it into a table with a temporary name (not a MySQL temporary table or you risk data loss), and when the upload is finished you simply atomically rename the table.
Upvotes: 0
Reputation: 881
Create unique key on longitude,latitude
and then you may use something like below
LOAD DATA LOCAL INFILE 'c:\\temp\\filename.csv'
replace
INTO TABLE table_name
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(@col1,@col2,@col3)
set
column1 = @col1,
column2= @col2,
column3= @col3;
this will replace duplicate long,lat and insert new
Upvotes: 1