S. Dylan
S. Dylan

Reputation: 58

How can I import csv file to MySQL database more efficiently with PHP?

I explain, I have a Symfony2 project and I need to import users via csv file in my database. I have to do some work on the datas before importing it in MySQL. I created a service for this and everything is working fine but it takes too much time to execute and slow my server if I give it my entire file. My files have usually between 500 and 1500 rows and I have to split my file in ~200 rows files and import one by one.

I need to handle related users that can be both in the file and/or in database already. Related users are usually a parent of a child.

Here is my simplified code :

$validator = $this->validator;

$members = array();
$children = array();
$mails = array();

$handle = fopen($filePath, 'r');
$datas = fgetcsv($handle, 0, ";");

while (($datas = fgetcsv($handle, 0, ";")) !== false) {

    $user = new User();

    //If there is a related user
    if($datas[18] != ""){
        $user->setRelatedMemberEmail($datas[18]);

        $relation = array_search(ucfirst(strtolower($datas[19])), UserComiti::$RELATIONSHIPS);
        if($relation !== false)
            $user->setParentRelationship($relation);
    }
    else {
        $user->setRelatedMemberEmail($datas[0]);
        $user->addRole ( "ROLE_MEMBER" );
    }

    $user->setEmail($mail);
    $user->setLastName($lastName);
    $user->setFirstName($firstName);
    $user->setGender($gender);
    $user->setBirthdate($birthdate);
    $user->setCity($city);
    $user->setPostalCode($zipCode);
    $user->setAddressLine1($adressLine1);
    $user->setAddressLine2($adressLine2);
    $user->setCountry($country);
    $user->setNationality($nationality);
    $user->setPhoneNumber($phone);

    //Entity Validation
    $listErrors = $validator->validate($user);

    //In case of errors
    if(count($listErrors) > 0) {
         foreach($listErrors as $error){
              $nbError++;
              $errors .= "Line " . $line . " : " . $error->getMessage() . "\n";
         }
   }

   else {
       if($mailParent != null)
            $children[] = $user;

       else{
            $members[] = $user;
            $nbAdded++;
       }
   }

   foreach($members as $user){
        $this->em->persist($user);
        $this->em->flush();
   }

   foreach($children as $child){

       //If the related user is already in DB
       $parent = $this->userRepo->findOneBy(array('username' => $child->getRelatedMemberEmail(), 'club' => $this->club));

       if ($parent !== false){

           //Check if someone related to related user already has the same last name and first name. If it is the case we can guess that this user is already created
           $testName = $this->userRepo->findByParentAndName($child->getFirstName(), $child->getLastName(), $parent, $this->club);

           if(!$testName){
                $child->setParent($parent);
                $this->em->persist($child);
                $nbAdded++;
           }
           else
                $nbSkipped++;
       }

       //Else in case the related user is neither file nor in database we create a fake one that will be able to update his profile later.
       else{

            $newParent = clone $child;
            $newParent->setUsername($child->getRelatedMemberEmail());
            $newParent->setEmail($child->getRelatedMemberEmail());
            $newParent->setFirstName('Unknown');

            $this->em->persist($newParent);
            $child->setParent($newParent);
            $this->em->persist($child);

            $nbAdded += 2;
            $this->em->flush();
        }
    }
}

It's not my whole service because I don't think the remaining would help here but if you need more information ask me.

Upvotes: 2

Views: 1018

Answers (2)

serek
serek

Reputation: 103

I know that it is very old topic, but some time ago I created a bundle, which can help import entities from csv to database. So maybe if someone will see this topic, it will be helpful for him.

https://github.com/jgrygierek/BatchEntityImportBundle https://github.com/jgrygierek/SonataBatchEntityImportBundle

Upvotes: 0

Tim
Tim

Reputation: 875

While I do not heave the means to quantitatively determine the bottlenecks in your program, I can suggest a couple of guidelines that will likely significantly increase its performance.

  1. Minimize the number of database commits you are making. A lot happens when you write to the database. Is it possible to commit only once at the end?

  2. Minimize the number of database reads you are making. Similar to the previous point, a lot happens when you read from the database.


If after considering the above points you still have issues, determine what SQL the ORM is actually generating and executing. ORMs work great until efficiency becomes a problem and more care needs to go into ensuring optimal queries are being generated. At this point, becoming more familiar with the ORM and SQL would be beneficial.


You don't seem to be working with too much data, but if you were, MySQL alone supports reading CSV files.

The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. https://dev.mysql.com/doc/refman/5.7/en/load-data.html

You may be able to access this MySQL specific feature through your ORM, but if not, you would need to write some plain SQL to utilize it. Since you need to modify the data you are reading from the CSV, you would likely be able to do this very, very quickly by following these steps:

  1. Use LOAD DATA INFILE to read the CSV into a temporary table.
  2. Manipulate the data in the temporary table and other tables as required.
  3. SELECT the data from the temporary table into your destination table.

Upvotes: 1

Related Questions