Blacky
Blacky

Reputation: 11

Importing big CSV files to MySQL database using PHP with checking for duplicates

I hope you can help me. I searched a lot, but unfortunately didn't find anything. What's the problem? I've got big CSV files with 1 column, which contains e-mail addresses. There are about 50000 lines in single file. I'm creating administration panel, which allows to import these files to the server, using HTML form and PHP. Importing CSV to MySQL database through PHP is simple, but I need something more - check for every e-mail does it exists, and if yes - skip it. What's the problem? Table has about million+ records, checking one e-mail lasts +/- 3 seconds. 50000 records multiplied by 3... it's gonna take min. 44 hours! PHP script stops responding after less than 10 minutes... So it's impossible do it this way:

function doesExist($email) {
    $sql = "SELECT count(*) as counter FROM mailing_subscribers WHERE subscriber_email LIKE :subscriber_email";
    $sth = $this->db->prepare($sql);
    $sth->execute(array(':subscriber_email' => $email));
    $row = $sth->fetch();
    $counter = $row->counter;
    if ($counter > 0) {
        return true;
    } else {
        return false;
    }
}

function importCSV($file,$group) {            

    $fp = fopen($file['tmp_name'], "r");
    $importsCounter = 0;

    while($csv_line = fgetcsv($fp)) {
        for ($i = 0, $j = count($csv_line); $i < $j; $i++) {
            if (!$this->doesExist($csv_line[$i])) {
                $sql = "INSERT INTO mailing_subscribers(subscriber_email,subscriber_group) VALUES('".$csv_line[$i]."','".$group."')";
                $sth = $this->db->prepare($sql);
                $sth->execute();
                $importsCounter++;
            }
        }
    }

    $_SESSION["feedback_positive"][] = FEEDBACK_FILE_IMPORT_SUCCESSFUL . " Utworzonych wpisów: " . $importsCounter;
}

$file is a $_FILE array.

Is there any other and faster method to do it?

Upvotes: 1

Views: 1252

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 1549

Below is my suggestion:

1) Load your csv file in Temporary table. refer http://dev.mysql.com/doc/refman/5.1/en/load-data.html

2) It will load your bulk csv data very fast may be in seconds. Now use insert query and insert data from temporary table to master table with duplicate value check.

E.G

1) Lets assume you have load csv data in temporary table named "TempTable"

2) say your master table name is "mailing_subscribers"

3) say you do not want duplicate record to be insert.

your query will be like:

insert into mailing_subscribers (subscriber_email,cola,colb..) select subscriber_email,cola,colb.. from TempTable where subscriber_email not in (select subscriber_email from mailing_subscribers)

Please let me know if you face any issue.

Upvotes: 1

Related Questions