Mead Umandal
Mead Umandal

Reputation: 373

How to import huge CSV file with 200,00 rows to MySQL (asynchronous and fast)?

I have to write a PHP script that will import data from a given CSV file into MySQL database. The given CSV file can contain up to 200,000 rows. I tried the following but problems arise :

  1. LOAD DATA LOCAL INFILE : I cannot use LOAD DATA LOCAL INFILE statement because I wanted to do some validations first BEFORE uploading the rows, also, our DB admin doesn't want me to use that statement and I don't know why.
  2. FOR LOOP : Inserting line by line inside FOR loop will take too much time resulting to Connection Timeout.

Now, I am thinking of a solution by splitting the CSV file into smaller chunks, then inserting them asynchronously. I am already done with the splitting of CSV, but I currently have no idea how to asynchronously insert into my database for quick and safe way. But I heard that I will be using Ajax here.

Any solution you can recommend? Thanks a lot in advance!

Upvotes: 8

Views: 31430

Answers (4)

Mead Umandal
Mead Umandal

Reputation: 373

Thanks to everyone who gave answers to this question. I have discovered a solution! Just wanted to share it, in case someone needs to create a PHP script that will import a huge CSV file into MySQL database (asynchronously and fast!) I have tested my code with 400,000 rows and the importing is done in seconds. I believe it would work with larger files, you just have to modify maximum upload file size.

In this example, I will be importing a CSV file that contains two columns (name, contact_number) into a MySQL DB that contains the same columns.

Your CSV file should look like this :

Ana, 0906123489

John, 0908989199

Peter, 0908298392

...

...

So, here's the solution.

First, create your table

CREATE TABLE `testdb`.`table_test`
( `id` INT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(100) NOT NULL ,
`contact_number` VARCHAR(100) NOT NULL ,
PRIMARY KEY (`id`)) ENGINE = InnoDB;

Second, I have 4 PHP files. All you have to do is place this into a single folder. PHP files are as follows :

index.php

<form action="upload.php" method="post" enctype="multipart/form-data">
<input type="file" name="csv" value="" />
<input type="submit" name="submit" value="Save" /></form>

connect.php

<?php
//modify your connections here
$servername = "localhost";
$username = "root";
$password = "";
$dbname = "testDB";
$conn = new mysqli($servername, $username, $password, $dbname);
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
} 
?>

senddata.php

<?php
include('connect.php');
$data = $_POST['file'];
$handle = fopen($data, "r");
$test = file_get_contents($data);
if ($handle) {
    $counter = 0;
    //instead of executing query one by one,
    //let us prepare 1 SQL query that will insert all values from the batch
    $sql ="INSERT INTO table_test(name,contact_number) VALUES ";
    while (($line = fgets($handle)) !== false) {
      $sql .= "($line),";
      $counter++;
    }
    $sql = substr($sql, 0, strlen($sql) - 1);
     if ($conn->query($sql) === TRUE) {
    } else {
     }
    fclose($handle);
} else {  
} 
//unlink CSV file once already imported to DB to clear directory
unlink($data);
?>

upload.php

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.1/jquery.js"></script>
<script>
//Declaration of function that will insert data into database
 function senddata(filename){
        var file = filename;
        $.ajax({
            type: "POST",
            url: "senddata.php",
            data: {file},
            async: true,
            success: function(html){
                $("#result").html(html);
            }
        })
        }
 </script>
<?php
$csv = array();
$batchsize = 1000; //split huge CSV file by 1,000, you can modify this based on your needs
if($_FILES['csv']['error'] == 0){
    $name = $_FILES['csv']['name'];
    $ext = strtolower(end(explode('.', $_FILES['csv']['name'])));
    $tmpName = $_FILES['csv']['tmp_name'];
    if($ext === 'csv'){ //check if uploaded file is of CSV format
        if(($handle = fopen($tmpName, 'r')) !== FALSE) {
            set_time_limit(0);
            $row = 0;
            while(($data = fgetcsv($handle)) !== FALSE) {
                $col_count = count($data);
                //splitting of CSV file :
                if ($row % $batchsize == 0):
                    $file = fopen("minpoints$row.csv","w");
                endif;
                $csv[$row]['col1'] = $data[0];
                $csv[$row]['col2'] = $data[1];
                $min = $data[0];
                $points = $data[1];
                $json = "'$min', '$points'";
                fwrite($file,$json.PHP_EOL);
                //sending the splitted CSV files, batch by batch...
                if ($row % $batchsize == 0):
                    echo "<script> senddata('minpoints$row.csv'); </script>";
                endif;
                $row++; 
            }
            fclose($file);
            fclose($handle);
        }
    }
    else
    {
        echo "Only CSV files are allowed.";
    }
    //alert once done.
    echo "<script> alert('CSV imported!') </script>";
}
?>

That's it! You already have a pure PHP script that can import multiple number of rows in seconds! :) (Thanks to my partner who taught and gave me an idea on how to use ajax)

Upvotes: 24

flaschenpost
flaschenpost

Reputation: 2235

The main slowness comes from sending every single line as it's own request. I would suggest to send the query with every 1000 or 500 rows in the same format used by mysqldump --opt, so build a long string in the way

 insert into datatable (name, prename, commen) 
   values ('wurst', 'hans', 'someone')
   , ('bush', 'george', 'otherone')
   , ...
   ;

You should check how long your lines are allowed to be or if the MySQL- Server is in your control you could extend the maximal query length.

If this is still too long (I mean 200K is not much at all), then you could try to improve the csv-reading.

It is a bit work splitting into those chunks, but you could write a small chunk-class for this, so adding the rows gets a bit easier.

The usage of this class looked like

$chunk->prepare("insert into datatable (name, prename, comment) values");
$chunk->setSize(1000);

foreach ($row...){
   if($query = $chunk->addRow(...)){
       callUpdate($query);
   }
}
if($query = $chunk->clear()){
  callUpdate($query);
}

Upvotes: 2

SatanicGeek
SatanicGeek

Reputation: 342

You can use fgetcsv() with PHP.

Here is an example :

// Open the file with PHP
$oFile = fopen('PATH_TO_FILE', 'w');

// Get the csv content
$aCsvContent = fgetcsv($oFile);

// Browse your csv line per line
foreach($aCsvContent as $aRow){

    $sReqInsertData = ' INSERT
                        INTO
                            TABLENAME
                        SET
                            FIELD1 = "'.$aRow[0].'",
                            FIELD2 = "'.$aRow[1].'",
                            FIELD3 = "'.$aRow[2].'",
                            FIELD4 = "'.$aRow[3].'",
                            FIELD5 = "'.$aRow[4].'",
                            FIELD6 = "'.$aRow[5].'",
                            FIELD7 = "'.$aRow[6].'",
                            FIELD8 = "'.$aRow[7].'"';

    // Execute your sql with mysqli_query or something like this
    mysqli_query($sReqInsertData);
}

// Close you file
fclose($oFile);

Upvotes: 0

Julio Soares
Julio Soares

Reputation: 1190

I would still use LOAD DATA LOCAL INFILE into a temporary table and use MySQL to validate, filter, clean, etc with all data in a DB and then populate the destination table with the ready to go records.

Upvotes: 0

Related Questions