PrimuS
PrimuS

Reputation: 2683

Large Excel import to DB gets very slow in Symfony

I have a script that imports a large Excel file with a lot of foreaches and after 50 something iterations it gets unbearably slow... Can I improve this somehow?

I try to make it as readable as possible with this:

foreach worksheet (approx 20) {
    NEW DB ENTRY, PERSIST, FLUSH (account)
    foreach row (10-100){
        NEW DB ENTRY, PERSIST, FLUSH (object)
        foreach column (approx. 10){
            CREATE NEW DB ENTRY, FOREIGN KEY to 'object', PERSIST, FLUSH (weekdates)
        }
        foreach column (approx. 50){
            CREATE NEW DB ENTRY, FOREIGN KEY to 'object', PERSIST, FLUSH (scheduleEntry)

            CREATE NEW DB ENTRY, FOREIGN KEY to 'scheduleEntry', PERSIST, FLUSH (scheduleObject)

            CREATE NEW DB ENTRY, FOREIGN KEY to 'scheduleObject', PERSIST, FLUSH (scheduleModule)

           /* WORST CASE IS THAT HERE WE HAVE FLUSHED 100000 times */
        }
    }
}

Is there a way to fasten up especially the last foreach? I think I need to flush everytime as I have to FOREIGN KEY the previous entry to the new one, am I right? By slow I mean that the excel file takes 24+ hours to import. It had about the numbers in the example.

The actual (still simplyfied) code looks sth like this

/* Create Excel */
$excel = $this->getContainer()->get('phpexcel')->createPHPExcelObject(Constants::FULL_PATH . 'excel/touren_' . $filename . '.xls');
$sheets = $excel->getAllSheets();
foreach ($sheets as $id => $sheet) {
    $ws = $sheet->toArray();

    /* Read sth from first line and create an 'account' from this */
    $n = new Network();
    ....
    $em->persist($n);

    try {
        $em->flush();
        $output->writeln('----><info>Inserted in DB</info>');
    } catch (Exception $e) {
        $output->writeln('----><error>DB ERROR</error>');
    }

    /* Go through all rows of current WorkSheet */
    foreach ($ws as $row) {
        /* Create new Object */
        $object = new Object();
        ...
        $em->persist($object);

        try {
            $em->flush();
            $output->writeln("------->Save Object to DB: <info>OK</info>");
        } catch (\Exception $e) {
            $output->writeln("------->Save Object to DB: <error>Failed: " . $e->getMessage() . "</error>");
        }

       /* Create new Tour for weekday/client */
       $tour = new Tour();
       $tour->setNetwork($n);

      /* More foreach */
      foreach ($clientKey as $filialNo => $filialKey) {
          $tourObject = new TourObject();
          $tourObject->setTour($tour);
          $tourObject->setObject($o);
          $em->persist($tourObject);


         /* Count Intervals */
        foreach ($filialKey as $tasks) {
            if (!$tourObject->getModule()->contains($module)) {
                $tourObject->addModule($module);
                $em->persist($tourObject);

                /* More foreach */
                foreach ($period as $date) {
                    $schedule = new Schedule();
                    $schedule->setTour($tour);
                    ....
                    $em->persist($schedule);
                    try {
                        $em->flush();
                        $output->writeln("------->Save Schedule to DB: <info>OK</info>");
                    } catch (\Exception $e) {
                        $output->writeln("------->Save Schedule to DB: <error>Failed: " . $e->getMessage() . "</error>");
                    }


                    $scheduleObject = new ScheduleObject();
                    $scheduleObject->setSchedule($schedule);
                    ....
                    $em->persist($scheduleObject);
                    try {
                        $em->flush();
                        $output->writeln("------->Save ScheduleObject to DB: <info>OK</info>");
                    } catch (\Exception $e) {
                        $output->writeln("------->Save ScheduleObject to DB: <error>Failed: " . $e->getMessage() . "</error>");
                    }

                    $scheduleObjectModule = new ScheduleObjectModule();
                    $scheduleObjectModule->setScheduleObject($scheduleObject);
                    $em->persist($scheduleObjectModule);
                    try {
                        $em->flush();                                                               
                        $output->writeln("------->Save ScheduleObjectModule to DB: <info>OK</info>");
                    } catch (\Exception $e) {
                        $output->writeln("------->Save ScheduleObjectModule to DB: <error>Failed: " . $e->getMessage() . "</error>");
                    }
                }
            }
        }
      }

      /* Flush all?!? */
      try {
            $em->flush();
            $output->writeln("------->Save Task to DB: <info>OK</info>");
      } catch (\Exception $e) {
            $output->writeln("------->Save Task to DB: <error>Failed: " . $e->getMessage() . "</error>");
      }
    }

Upvotes: 0

Views: 431

Answers (2)

Arne
Arne

Reputation: 381

Every entity you create/persist through the EntityManager is stored in the UnitOfWork and now became a "managed" entity. If this UnitOfWork fills up, its fairly heavy on the system. You could call $entityManager->clear() after each "sheet" so that the UoW gets cleared after each iteration.

Each entity has its own UnitOfWork, and you can clear the UoW for each entity separately, but since you create lots of entities, i would suggest not specifying an entity class and just clearing all of them.

  ...
  /* Flush all?!? */
  try {
        $em->flush();
        $em->clear();
        $output->writeln("------->Save Task to DB: <info>OK</info>");
  } catch (\Exception $e) {
        $output->writeln("------->Save Task to DB: <error>Failed: " . $e->getMessage() . "</error>");
  }

Or you could use native queries to insert in your DB, but that might not always be what you want in terms of data consistency etc.etc.

Also as pointed out above, you don't need to flush after each entity. If you call flush only once, after each 'sheet', Doctrine will do all insert statements at once.

Upvotes: 2

Youssef
Youssef

Reputation: 1043

I think a good solution is to use a native DB utility for this (like Mysql Load data infile)

This is going to be a lot faster than anything you can write in PHP.

Upvotes: 0

Related Questions